timestamp is inserted as null when I convert it from string - hive

I have 2 tables
CREATE TABLE table1(
ctn string
, platform_code string
, status string
, status_date date
)
PARTITIONED BY(time_key string)
STORED AS ORC;
and
CREATE TABLE table2(
time_key timestamp
, ctn string
, platform_code string
, status string
, status_date timestamp
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\;'
STORED AS TEXTFILE;
In table1 one I have some data. Example
(9056697563,C,,,2017-09-22)
status is empty string and status_date is null. I need to insert this row while converting 2017-09-22 to timestamp. When I try this
SELECT from_unixtime(unix_timestamp(time_key, 'yyyy-MM-dd')) AS time_key
, ctn
, platform_code
, status
, from_unixtime(unix_timestamp(time_key, 'yyyy-MM-dd')) AS status_date
FROM table1
WHERE time_key = '2017-09-22';
both time_key and status_date have value 2017-09-22 00:00:00, but when I try
INSERT OVERWRITE TABLE table2
SELECT from_unixtime(unix_timestamp(time_key, 'yyyy-MM-dd')) AS time_key
, ctn
, platform_code
, status
, from_unixtime(unix_timestamp(time_key, 'yyyy-MM-dd')) AS status_date
FROM table1
WHERE time_key = '2017-09-22';
time_key ends as null and status_date as 2017-09-22 00:00:00
Why does this happen?
EDIT
After I removed AS time_key and AS status_date it worked properly. Does anybody know why?

Related

Aggregate By month from date data in Big Query

I have a data table with three columns -
Date, Order Amount, Branch id
Date Format in the date column - yyyy-mm-dd 00:00:00
I want the information to be aggregated in MM.YY format.
I tried format_date and group by functions, but unable to run the code. Any help would be highly appreciated.
Try this one assuming that Date column has a date-formatted string.
WITH sample AS (
SELECT '2022-05-22 00:00:00' AS `Date`, 100 AS OrderAmount, 1 AS BranchID
UNION ALL
SELECT '2022-05-21 00:00:00' AS `Date`, 200 AS OrderAmount, 1 AS BranchID
UNION ALL
SELECT '2022-04-22 00:00:00' AS `Date`, 150 AS OrderAmount, 2 AS BranchID
UNION ALL
SELECT '2022-04-21 00:00:00' AS `Date`, 250 AS OrderAmount, 2 AS BranchID
)
SELECT BranchID, FORMAT_DATE('%m.%y', DATE(LEFT(`Date`, 10))) AS mmyy, SUM(OrderAmount) OrderAmounts
FROM sample
GROUP BY 1, 2
;
output:
Consider below option/example
SELECT BranchID,
FORMAT_DATE('%m.%y', DATE(Date)) AS mmyy,
SUM(OrderAmount) AS OrderAmounts
FROM sample
GROUP BY 1, 2

Why group by date is returning multiple rows for the same date?

I have a query like the following.
select some_date_col, count(*) as cnt
from <the table>
group by some_date_col
I get something like that at the output.
13-12-2021, 6
13-12-2021, 8
13-12-2021, 9
....
How is that possible? Here some_date_col is of type Date.
A DATE is a binary data-type that is composed of 7 bytes (century, year-of-century, month, day, hour, minute and second) and will always have those components.
The user interface you use to access the database can choose to display some or all of those components of the binary representation of the DATE; however, regardless of whether or not they are displayed by the UI, all the components are always stored in the database and used in comparisons in queries.
When you GROUP BY a date data-type you aggregate values that have identical values down to an accuracy of a second (regardless of the accuracy the user interface).
So, if you have the data:
CREATE TABLE the_table (some_date_col) AS
SELECT DATE '2021-12-13' FROM DUAL CONNECT BY LEVEL <= 6 UNION ALL
SELECT DATE '2021-12-13' + INTERVAL '1' SECOND FROM DUAL CONNECT BY LEVEL <= 8 UNION ALL
SELECT DATE '2021-12-13' + INTERVAL '1' MINUTE FROM DUAL CONNECT BY LEVEL <= 9;
Then the query:
SELECT TO_CHAR(some_date_col, 'YYYY-MM-DD HH24:MI:SS') AS some_date_col,
count(*) as cnt
FROM the_table
GROUP BY some_date_col;
Will output:
SOME_DATE_COL
CNT
2021-12-13 00:01:00
9
2021-12-13 00:00:01
8
2021-12-13 00:00:00
6
The values are grouped according to equal values (down to the maximum precision stored in the date).
If you want to GROUP BY dates with the same date component but any time component then use the TRUNCate function (which returns a value with the same date component but the time component set to midnight):
SELECT TRUNC(some_date_col) AS some_date_col,
count(*) as cnt
FROM <the table>
GROUP BY TRUNC(some_date_col)
Which, for the same data outputs:
SOME_DATE_COL
CNT
13-DEC-21
23
And:
SELECT TO_CHAR(TRUNC(some_date_col), 'YYYY-MM-DD HH24:MI:SS') AS some_date_col,
count(*) as cnt
FROM the_table
GROUP BY TRUNC(some_date_col)
Outputs:
SOME_DATE_COL
CNT
2021-12-13 00:00:00
23
db<>fiddle here
Oracle date type holds a date and time component. If the time components do not match, grouping by that value will place the same date (with different times) in different groups:
The fiddle
CREATE TABLE test ( xdate date );
INSERT INTO test VALUES (current_date);
INSERT INTO test VALUES (current_date + INTERVAL '1' MINUTE);
With the default display format:
SELECT xdate, COUNT(*) FROM test GROUP BY xdate;
Result:
XDATE
COUNT(*)
13-DEC-21
1
13-DEC-21
1
Now alter the format and rerun:
ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MON-DD HH24:MI:SS';
SELECT xdate, COUNT(*) FROM test GROUP BY xdate;
The result
XDATE
COUNT(*)
2021-DEC-13 23:29:36
1
2021-DEC-13 23:30:36
1
Also try this:
SELECT to_char(xdate, 'YYYY-MON-DD HH24:MI:SS') AS formatted FROM test;
Result:
FORMATTED
2021-DEC-13 23:29:36
2021-DEC-13 23:30:36
and this:
SELECT to_char(xdate, 'YYYY-MON-DD HH24:MI:SS') AS formatted, COUNT(*) FROM test GROUP BY xdate;
Result:
FORMATTED
COUNT(*)
2021-DEC-13 23:29:36
1
2021-DEC-13 23:30:36
1

Partitioning by specific value and then by range in Oracle

I have a table with the following columns:
CREATE TABLE CUST_HISTORY (
ID NUMBER,
PRD_CNT NUMBER,
DATE_TO DATE
)
Now, I would like to apply the following partitioning strategy:
all values where DATE_TO = '9999-12-31' should be assigned to one partition called "p_max"
all remaining values of DATE_TO should be partitioned by monthly intervals (from DATE_TO)
Any hints?
From this answer:
CREATE TABLE CUST_HISTORY (
ID NUMBER,
PRD_CNT NUMBER,
DATE_TO DATE
)
PARTITION BY RANGE (date_to)
INTERVAL (INTERVAL '1' MONTH)
(PARTITION p_first VALUES LESS THAN ( DATE '2019-01-01' ) );
db<>fiddle
If you particularly want the partition to be named as p_max then you can use a virtual column to remap the DATE_TO value from a high vale to a low value so you can name the partition and then use range intervals:
CREATE TABLE CUST_HISTORY (
ID NUMBER,
PRD_CNT NUMBER,
DATE_TO DATE CHECK ( DATE_TO >= DATE '1900-01-01' ),
remapped_date_to DATE
GENERATED ALWAYS AS
( CASE WHEN date_to = DATE '9999-12-31' THEN DATE '0001-01-01' ELSE date_to END )
VIRTUAL
)
PARTITION BY RANGE (remapped_date_to)
INTERVAL (INTERVAL '1' MONTH)
(PARTITION p_max VALUES LESS THAN ( DATE '1900-01-01' ) );
db<>fiddle
or use AUTOMATIC LIST partitioning (Oracle 12c or later) with a virtual column:
CREATE TABLE CUST_HISTORY (
ID NUMBER,
PRD_CNT NUMBER,
DATE_TO DATE,
month_to DATE
-- INVISIBLE
GENERATED ALWAYS AS
( CASE WHEN date_to = DATE '9999-12-31' THEN date_to ELSE TRUNC( date_to, 'MM' ) END )
VIRTUAL
)
PARTITION BY LIST ( month_to ) AUTOMATIC
( PARTITION p_max VALUES ( DATE '9999-12-31' ) );
db<>fiddle
(If you want you can also make the virtual column INVISIBLE)
The solution with monthly partitioning (please observe last_day function):
CREATE TABLE CUST_HISTORY (
ID NUMBER,
PRD_CNT NUMBER,
DATE_TO DATE,
month_to DATE
GENERATED ALWAYS AS
( CASE WHEN date_to = DATE '9999-12-31' THEN date_to ELSE TRUNC( last_day(date_to) ) END )
VIRTUAL
)
PARTITION BY LIST ( month_to ) AUTOMATIC
( PARTITION p_max VALUES ( DATE '9999-12-31' ) );
DB<>FIDDLE
Thank you for inspiration, #mt0!

Oracle column with Timestamp(6) data type extract using ssis

I am trying to pull an oracle table data into sql server using SSIS. I have a package variable that holds the source query that needs to be fired in the oracle db.I have a data flow task with oledb source(oracle) and oledb destination(sql server).
The oledb source query(variable) is as follows
Select A,B,C
From "Table" TT
Where C in (Select coalesce(ab,cd) as c
From "Table" T2
Where Last_Upd_Dt >= '2018-09-24 12:00:00')
The column Last_Upd_Dt is a TimeStamp(6) with default value of LocalTimeStamp in the source oracle DB
My question is in what format should my input parameter value be so that I dont have to convert the Last_Upd_Dt column to TO_DATE(), TO_CHAR() etc.
If I run that query using SSIS I get
ORA-01843: not a valid month
Just use
... Where Last_Upd_Dt >= to_date('2018-09-24 12:00:00','yyyy-mm-dd hh24:mi:ss')
or
... Where Last_Upd_Dt >= timestamp'2018-09-24 12:00:00'
you may refer the following demonstration :
SQL> create table tab ( id int not null, time timestamp(6) default LocalTimeStamp not null );
Table created
SQL> insert into tab(id) values(1);
1 row inserted
SQL> select * from tab t;
ID TIME
-- --------------------------
1 26/09/2018 08:23:23,068025
SQL> select * from tab where Last_Upd_Dt >= to_date('2018-09-24 12:00:00','yyyy-mm-dd hh24:mi:ss');
ID TIME
-- --------------------------
1 26/09/2018 08:23:23,068025
SQL> select * from tab where Last_Upd_Dt >= timestamp'2018-09-24 12:00:00';
ID TIME
-- --------------------------
1 26/09/2018 08:23:23,068025
SQL> select * from tab where Last_Upd_Dt >= to_date('2018-09-26 12:00:00','yyyy-mm-dd hh24:mi:ss');
ID TIME
-- --------------------------
--> no rows selected
Oracle supports the DATE and TIMESTAMP keywords. You can express the logic as:
where Last_Upd_Dt >= TIMESTAMP '2018-09-24 12:00:00'
If you did not have a time component, you would do:
where Last_Upd_Dt >= DATE '2018-09-24'

Comparing values sql

I have a table wherein I have to report the the present status and the date from which this status is applicable.
Example:
Status date
1 26 July
1 24 July
1 22 July
2 21 July
2 19 July
1 16 July
0 14 July
Given this, i want to display the current status as 1 and date as 22 July> I am not sure how to go about this.
Status date
1 25 July
1 24 July
1 20 July
In this case, I want to show the status as 1 and date as 20th July
This should pull what you need using very standard SQL:
-- Get the oldest date that is the current Status
select Status, min(date) as date
from MyTable
where date > (
-- Get the most recent date that isn't the current Status
select max(date)
from MyTable
where Status != (
-- Get the current Status
select Status -- May need max/min here for multiple statuses on same date
from MyTable
where date = (
-- Get the most recent date
select max(date)
from MyTable
)
)
)
group by Status
I'm assuming that the date column is of a data type suitable for sorting properly (as in, not a string, unless you can cast it).
This is a little inelegant, but it should work
SELECT status, date
FROM my_table t
WHERE status = ALL (SELECT status
FROM my_table
WHERE date = ALL(SELECT MAX(date) FROM my_table))
AND date = ALL (SELECT MIN(date)
FROM my_table t1
WHERE t1.status = t.status
AND NOT EXISTS (SELECT *
FROM my_table t2
WHERE t2.date > t1.date AND t2.status <> t1.status))
Another option is to use a window function like LEAD (or LAG depending on how you order your results). In this example we mark the row when the status changes with the date, order the results and exclude rows other than the first one:
with test_data as (
select 1 status, date '2012-07-26' status_date from dual union all
select 1 status, date '2012-07-24' status_date from dual union all
select 1 status, date '2012-07-22' status_date from dual union all
select 2 status, date '2012-07-21' status_date from dual union all
select 2 status, date '2012-07-19' status_date from dual union all
select 1 status, date '2012-07-16' status_date from dual union all
select 0 status, date '2012-07-14' status_date from dual)
select status, as_of
from (
select status
, case when status != lead(status) over (order by status_date desc) then status_date else null end as_of
from test_data
order by as_of desc nulls last
)
where rownum = 1;
Addendum:
The LEAD and LAG functions accept two more parameters: offset and default. The offset defaults to 1, and default defaults to null. The default allows you to determine what value to consider when you are at the beginning or end of the result set. In your case when the status has never changed, a default is needed. In this example I supplied -1 as a status default because I am assuming that status value is not part of your expected set:
with test_data as (
select 1 status, date '2012-07-25' status_date from dual union all
select 1 status, date '2012-07-24' status_date from dual union all
select 1 status, date '2012-07-20' status_date from dual)
select status, as_of
from (
select status
, case when status != lead(status,1,-1) over (order by status_date desc) then status_date else null end as_of
from test_data
order by as_of desc nulls last
)
where rownum = 1;
You can play around with the case condition (equals/not equals), the order by clause in the lead function, and the desired default to accomplish your needs.