Hive table showing NULL output for a datetime column

Hive table showing NULL output for a datetime column - hive

I am creating a hive table on a .txt file placed in an HDFS directory. While accessing the data, it shows the output as NULL for the last datetime column(order_dtm). I have searched and tried other options provided on google but nothing has worked so far.
Hive Query:---Tab delimited
Create EXTERNAL table Orders(
order_id int,
cust_id int,
order_dtm TIMESTAMP)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/analyst/order/';
HDFS File -head
>> hdfs dfs -cat /user/analyst/order/orders.txt | head -10
17/09/15 23:46:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
5000001 1133938 06-01-2008 00:03:35
5000002 1131278 06-01-2008 00:27:42
5000003 1153459 06-01-2008 00:49:37
5000004 1159099 06-01-2008 01:05:28
5000005 1020687 06-01-2008 01:08:36
5000006 1187459 06-01-2008 01:11:09
5000007 1048773 06-01-2008 01:36:35
5000008 1064002 06-01-2008 01:36:52
5000009 1096744 06-01-2008 01:49:46
5000010 1107526 06-01-2008 03:07:14
cat: Unable to write to output stream.

create external table orders
(
order_id int
,cust_id int
,order_dtm string
)
row format delimited
fields terminated by ' '
location '/user/analyst/order'
tblproperties ('serialization.last.column.takes.rest'='true')
;
select * from orders
;
+-----------+----------+----------------------+
| order_id | cust_id | order_dtm |
+-----------+----------+----------------------+
| 5000001 | 1133938 | 06-01-2008 00:03:35 |
| 5000002 | 1131278 | 06-01-2008 00:27:42 |
| 5000003 | 1153459 | 06-01-2008 00:49:37 |
| 5000004 | 1159099 | 06-01-2008 01:05:28 |
| 5000005 | 1020687 | 06-01-2008 01:08:36 |
| 5000006 | 1187459 | 06-01-2008 01:11:09 |
| 5000007 | 1048773 | 06-01-2008 01:36:35 |
| 5000008 | 1064002 | 06-01-2008 01:36:52 |
| 5000009 | 1096744 | 06-01-2008 01:49:46 |
| 5000010 | 1107526 | 06-01-2008 03:07:14 |
+-----------+----------+----------------------+
create view orders_v
as
select order_id
,cust_id
,from_unixtime(to_unix_timestamp(order_dtm,'MM-dd-yyyy HH:mm:ss')) as order_dtm
from orders
;
select * from orders_v
;
+-----------+----------+----------------------+
| order_id | cust_id | order_dtm |
+-----------+----------+----------------------+
| 5000001 | 1133938 | 2008-06-01 00:03:35 |
| 5000002 | 1131278 | 2008-06-01 00:27:42 |
| 5000003 | 1153459 | 2008-06-01 00:49:37 |
| 5000004 | 1159099 | 2008-06-01 01:05:28 |
| 5000005 | 1020687 | 2008-06-01 01:08:36 |
| 5000006 | 1187459 | 2008-06-01 01:11:09 |
| 5000007 | 1048773 | 2008-06-01 01:36:35 |
| 5000008 | 1064002 | 2008-06-01 01:36:52 |
| 5000009 | 1096744 | 2008-06-01 01:49:46 |
| 5000010 | 1107526 | 2008-06-01 03:07:14 |
+-----------+----------+----------------------+

Related

How to split these multiple rows in SQL?

I am currently studying SQL and I am still a newbie. I have this task where I need to split some rows with various entries like dates and user IDs. I really need help
+-------+------------------------------+---------------------------+
| TYPE | DATES | USER _ID |
+-------+------------------------------+---------------------------+
| WORK | ["2022-06-02", "2022-06-03"] | {74042,88357,83902,88348} |
| LEAVE | ["2022-05-16", "2022-05-26"] | {83902,74042,88357,88348} |
+-------+------------------------------+---------------------------+
the end result should look like this. the user id's should be aligned or should be in the same as their respective dates.
+-------+------------+---------+
| TYPE | DATES | USER_ID |
+-------+------------+---------+
| LEAVE | 05/16/2022 | 74042 |
| LEAVE | 05/16/2022 | 88357 |
| LEAVE | 05/16/2022 | 88348 |
| LEAVE | 05/16/2022 | 83902 |
| LEAVE | 05/26/2022 | 74042 |
| LEAVE | 05/26/2022 | 88357 |
| LEAVE | 05/26/2022 | 88348 |
| LEAVE | 05/26/2022 | 83902 |
| WORK | 06/2/2022 | 74042 |
| WORK | 06/2/2022 | 88357 |
| WORK | 06/2/2022 | 88348 |
| WORK | 06/2/2022 | 83902 |
| WORK | 06/3/2022 | 74042 |
| WORK | 06/3/2022 | 88357 |
| WORK | 06/3/2022 | 88348 |
| WORK | 06/3/2022 | 83902 |
+-------+------------+---------+

Create table:
CREATE TABLE work_leave (
TYPE varchar,
DATES date,
USER_ID integer
);
INSERT INTO work_leave
VALUES ('LEAVE', '05/16/2022', 74042),
('LEAVE', '05/16/2022', 88357),
('LEAVE', '05/16/2022', 88348),
('LEAVE', '05/16/2022', 83902),
('LEAVE', '05/26/2022', 74042),
('LEAVE', '05/26/2022', 88357),
('LEAVE', '05/26/2022', 88348),
('LEAVE', '05/26/2022', 83902),
('WORK', '06/2/2022', 74042),
('WORK', '06/2/2022', 88357),
('WORK', '06/2/2022', 88348),
('WORK', '06/2/2022', 83902),
('WORK', '06/3/2022', 74042),
('WORK', '06/3/2022', 88357),
('WORK', '06/3/2022', 88348),
('WORK', '06/3/2022', 83902);
WITH date_ends AS (
SELECT
type,
ARRAY[min(dates),
max(dates)] AS dates
FROM
work_leave
GROUP BY
type
),
users AS (
SELECT
type,
array_agg(DISTINCT (user_id)
ORDER BY user_id) AS user_ids
FROM
work_leave
GROUP BY
type
)
SELECT
de.type,
de.dates,
u.user_ids
FROM
date_ends AS de
JOIN
users as u
ON de.type = u.type;
type | dates | user_ids
-------+-------------------------+---------------------------
LEAVE | {05/16/2022,05/26/2022} | {74042,83902,88348,88357}
WORK | {06/02/2022,06/03/2022} | {74042,83902,88348,88357}

I adjusted the data slightly for simplicity. Here's one idea:
WITH rows (type, dates, user_id) AS (
VALUES ('WORK', array['2022-06-02', '2022-06-03'], array[74042,88357,83902,88348])
, ('LEAVE', array['2022-05-16', '2022-05-26'], array[83902,74042,88357,88348])
)
SELECT r1.type, x.*
FROM rows AS r1
CROSS JOIN LATERAL (
SELECT r2.dates, r3.user_id
FROM unnest(r1.dates) AS r2(dates)
, unnest(r1.user_id) AS r3(user_id)
) AS x
;
The fiddle
The result:
type
dates
user_id
WORK
2022-06-02
74042
WORK
2022-06-02
88357
WORK
2022-06-02
83902
WORK
2022-06-02
88348
WORK
2022-06-03
74042
WORK
2022-06-03
88357
WORK
2022-06-03
83902
WORK
2022-06-03
88348
LEAVE
2022-05-16
83902
LEAVE
2022-05-16
74042
LEAVE
2022-05-16
88357
LEAVE
2022-05-16
88348
LEAVE
2022-05-26
83902
LEAVE
2022-05-26
74042
LEAVE
2022-05-26
88357
LEAVE
2022-05-26
88348

How to use wm_concat one a column that already exists in the query?

So... I am currently using Oracle 11.1g and I need to create a query that uses the ID and CusCODE from Table_with_value and checks Table_with_status using the ID to find active CO_status but on different CusCODE.
This is what I have so far - obviously does not work as it should unless CusCODE and ID are provided manually:
SELECT wm_concat(CoID) as active_CO_Status_for_same_ID_but_different_CusCODE
FROM Table_with_status
WHERE
CoID IN (SELECT CoID FROM Table_with_status WHERE ID = Table_with_value.ID AND CusCODE != Table_with_value.CusCODE)) AND Co_status = 'active';
Table_with_value:
|CoID | CusCODE | ID | Value |
|--------|---------|----------|----|
|354223 | 1.432 | 0784296L | 99 |
|321232 | 4.212321.22 | 0432296L | 32 |
|938421 | 3.213 | 0021321L | 93 |
Table_with_status:
|CoID | CusCODE | ID | Co_status|
|--------|--------------|----------|--------|
|354223 | 1.432 | 0784296L | active|
|354232 | 1.432 | 0784296L | inactive |
|666698 | 1.47621 | 0784296L | active |
|666700 | 1.5217 | 0784296L | active |
|938421 | 3.213 | 0021321L | active |
|938422 | 3.213 | 0021321L | active |
|938423 | 3.213 | 0021321L | active |
|321232 | 4.212321.22 | 0432296L | active |
|321232 | 4.212321.22 | 0432296L | active |
|321232 | 1.689 | 0432296L | inactive |
Expected output:
|CoID | active_CO_Status_for_same_ID_but_different_CusCODE | ID | Value |
|--------|---------|----------|----|
|354223 | 666698,666700 | 1.432 | 0784296L | 99 |
|321232 | N/A | 4.212321.22 | 0432296L | 32 |
|938421 | N/A | 3.213 | 0021321L | 93 |
Any idea on how this can be implemented ideally without any PL/SQL for loops, but it should be fine as well since the output dataset is expected < 300 IDs.
I apologize in advance for the cryptic nature in which I structured the question :) Let me know if something is not clear.

From your description and expected output, it looks like you need a left outer join, something like:
SELECT v.CoID,
wm_concat(s.CoID) as other_active_CusCODE -- active_CO_Status_for_same_ID_but_different_CusCODE
v.CusCODE,
v.ID,
v.value
FROM Table_with_value v
LEFT JOIN Table_with_status s
ON s.ID = v.ID
AND s.CusCODE != v.CusCODE
AND s.Co_status = 'active'
GROUP BY v.CoID, v.CusCODE, v.ID, v.value;
SQL Fiddle using listagg() instead of the never-supported and now-removed wm_concat(); with a couple of different approaches if the logic isn't quite what I interpreted. With your sample data they all get:
COID OTHER_ACTIVE_CUSCODE CUSCODE ID VALUE
------ -------------------- ----------- -------- -----
321232 (null) 4.212321.22 0432296L 32
354223 666698,666700 1.432 0784296L 99
938421 (null) 3.213 0021321L 93

Your code looks like it should work, assuming you are referring to the correct tables:
SELECT wm_concat(s.CoID) as active_CO_Status_for_same_ID_but_different_CusCODE
FROM Table_with_status s
WHERE s.CoID IN (SELECT v.CoID
FROM Table_with_value v
WHERE v.ID = s.ID AND
v.CusCODE <> s.CusCODE
) AND
s.Co_status = 'active';

PostgreSQL: show trips within a bounding box

I have a trips table containing user's trip information, like so:
select * from trips limit 10;
trip_id | daily_user_id | session_ids | seconds_start | lat_start | lon_start | seconds_end | lat_end | lon_end | distance
---------+---------------+-------------+---------------+------------+------------+-------------+------------+------------+------------------
594221 | 16772 | {170487} | 1561324555 | 41.1175475 | -8.6298934 | 1561325119 | 41.1554091 | -8.6283493 | 5875.39697884959
563097 | 7682 | {128618} | 1495295471 | 41.1782829 | -8.5950303 | 1495299137 | 41.1783908 | -8.5948965 | 5364.81067787512
596303 | 17264 | {172851} | 1578011699 | 41.5195598 | -8.6393526 | 1578012513 | 41.4614024 | -8.717709 | 11187.7956426909
595648 | 17124 | {172119} | 1575620857 | 41.1553116 | -8.6439528 | 1575621885 | 41.1621821 | -8.6383042 | 1774.83365424607
566061 | 8720 | {133624} | 1509005051 | 41.1241975 | -8.5958988 | 1509006310 | 41.1424158 | -8.6101461 | 3066.40306678979
566753 | 8947 | {134662} | 1511127813 | 41.1887996 | -8.5844238 | 1511129839 | 41.2107519 | -8.5511712 | 5264.64026582458
561179 | 7198 | {125861} | 1493311197 | 41.1776935 | -8.5947254 | 1493311859 | 41.1773815 | -8.5947254 | 771.437257541019
541328 | 2119 | {46950} | 1461103381 | 41.1779 | -8.5949738 | 1461103613 | 41.1779129 | -8.5950202 | 177.610819150637
535519 | 908 | {6016} | 1460140650 | 41.1644658 | -8.6422775 | 1460141201 | 41.1642646 | -8.6423309 | 1484.61552373019
548460 | 3525 | {102026} | 1462289206 | 41.177689 | -8.594679 | 1462289843 | 41.1734476 | -8.5916326 | 1108.05119077308
(10 rows)
The task is to filter trips that start and end within the bounding box defined by upper left: 41.24895, -8.68494 and lower right: 41.11591, -8.47569.

If I understand correctly, you can just compare that starting and ending coordinates:
select t.*
from trips t
where lat_start >= 41.11591 and lat_start <= 41.24895 and
lat_end >= 41.11591 and lat_end <= 41.24895 and
long_start >= -8.68494 and long_start <= -8.47569 and
long_end >= -8.68494 and long_end <= -8.47569

Since your coordinates are stored in x,y columns, you have to use ST_MakePoint to create a proper geometry. After that, you can create a BBOX using the function ST_MakeEnvelope and check if start and end coordinates are inside the BBOX using ST_Contains, e.g.
WITH bbox(geom) AS (
VALUES (ST_MakeEnvelope(-8.68494,41.24895,-8.47569,41.11591,4326))
)
SELECT * FROM trips,bbox
WHERE
ST_Contains(bbox.geom,ST_SetSRID(ST_MakePoint(lon_start,lat_start),4326)) AND
ST_Contains(bbox.geom,ST_SetSRID(ST_MakePoint(lon_end,lat_end),4326));
Note: the CTE isn't really necessary and is in the query just for illustration purposes. You can repeat the ST_MakeEnvelope function on both conditions in the WHERE clause instead of bbox.geom. This query also assumes the SRS WGS84 (4326).

PostgreSQL: Get elapsed amount between integers

I'm trying to get the difference between the first start_time and the last stop_time in the table. But I can't seem to get this done in one query. Can somebody help me? This is some sample data:
start_time | stop_time
-------------------+-------------------
1398871312.769668 | 1398871312.769676
1398871312.771368 | 1398871312.771429
1398871312.771471 | 1398871312.771476
1398871312.771494 | 1398871312.771543
1398871312.781109 | 1398871312.781115
1398871312.781150 | 1398871312.781154
1398871312.781233 | 1398871312.781282
1398871312.992759 | 1398871312.992765
1398871312.992795 | 1398871312.992798
1398871312.992832 | 1398871312.992881
1398871313.3387 | 1398871313.3399
1398871313.3435 | 1398871313.3440
1398871313.3703 | 1398871313.3745
1398871313.203462 | 1398871313.203469
1398871313.203497 | 1398871313.203501
1398871313.203560 | 1398871313.203600
1398871313.214120 | 1398871313.214127
1398871313.214153 | 1398871313.214158
1398871313.214177 | 1398871313.214192
1398871313.214208 | 1398871313.214248
1398871313.415027 | 1398871313.415035
1398871313.415136 | 1398871313.415140
1398871313.415218 | 1398871313.415226
1398871313.415252 | 1398871313.415265
1398871313.415290 | 1398871313.415298
1398871313.415332 | 1398871313.415339
1398871313.415350 | 1398871313.415362
1398871314.144867 | 1398871314.144886
1398871314.144896 | 1398871314.144901
1398871314.144906 | 1398871314.144912
1398871314.144918 | 1398871314.144923
1398871314.144927 | 1398871314.144931
1398871314.144935 | 1398871314.144939
1398871314.144965 | 1398871314.144974
1398871314.145055 | 1398871314.145060
1398871314.145138 | 1398871314.145146
1398871314.145152 | 1398871314.145158
1398871314.145166 | 1398871314.145173
1398871314.145211 | 1398871314.145215
1398871314.145235 | 1398871314.145243
1398871314.145247 | 1398871314.145252
1398871314.145262 | 1398871314.145267
1398871314.145307 | 1398871314.145314
1398871314.145547 | 1398871314.145551
1398871314.145563 | 1398871314.145571
1398871314.145576 | 1398871314.145581
1398871314.145586 | 1398871314.145590
1398871314.145600 | 1398871314.145606
1398871314.145611 | 1398871314.145618
1398871314.145623 | 1398871314.145627
1398871314.145634 | 1398871314.145641
1398871314.145999 | 1398871314.146003
1398871314.146014 | 1398871314.146022
1398871314.146026 | 1398871314.146033
1398871314.146043 | 1398871314.146050
1398871314.146140 | 1398871314.146145
1398871314.146160 | 1398871314.146168
1398871314.146178 | 1398871314.146185
So I want the difference between 1398871312 and 1398871314, in one query. Is this possible? Can anyone help me? Thanks!

Something like this?
select max(stop_time) - min(start_time)
from Table1

casting a REAL as INT and comparing

I am casting a real to an int and a float to an int and comparing the two like this:
where
cast(a.[SUM(PAID_AMT)] as int)!=cast(b.PAID_AMT as int)
but i am still getting results where the two are equal. for example:
+-----------+-----------+------------+------------+----------+
| accn | load_dt | pmtdt | sumpaidamt | Bpaidamt |
+-----------+-----------+------------+------------+----------+
| A133312 | 6/7/2011 | 11/28/2011 | 98.39 | 98.39 |
| A445070 | 6/2/2011 | 9/22/2011 | 204.93 | 204.93 |
| A465606 | 5/19/2011 | 10/19/2011 | 560.79 | 560.79 |
| A508742 | 7/12/2011 | 10/19/2011 | 279.65 | 279.65 |
| A567730 | 5/27/2011 | 10/24/2011 | 212.76 | 212.76 |
| A617277 | 7/12/2011 | 10/12/2011 | 322.02 | 322.02 |
| A626384 | 6/16/2011 | 10/21/2011 | 415.84 | 415.84 |
| AA0000044 | 5/12/2011 | 5/23/2011 | 197.38 | 197.38 |
+-----------+-----------+------------+------------+----------+
here is the full query:
select
a.accn,
a.load_dt,
a.pmtdt,
a.[SUM(PAID_AMT)] sumpaidamt,
sum(b.paid_amt) Bpaidamt
from
[MILLENNIUM_DW_DEV].[dbo].[Millennium_Payment_Data_May2011_July2012] a
join
F_PAYOR_PAYMENTS_DAILY b
on
a.accn=b.ACCESSION_ID
and
a.final_rpt_dt=b.FINAL_REPORT_DATE
and
a.load_dt=b.LOAD_DATE
and
a.pmtdt=b.PAYMENT_DATE
where
cast(a.[SUM(PAID_AMT)] as int)!=cast(b.PAID_AMT as int)
group by
a.accn,
a.load_dt,
a.pmtdt,
a.[SUM(PAID_AMT)]
what am i doing wrong? how do i return only records that are NOT equal?

I don't see why there is an issue.
The query is returning the sum of the payments in b (sum(b.paid_amt) Bpaidamt). The where clause is comparing individual payments. This just means that there is more than one payment.
Perhaps your intention is to have a HAVING clause instead:
having cast(a.[SUM(PAID_AMT)] as int)!=cast(sum(b.PAID_AMT) as int)

You can do a round and a cast statement.
cast(round(sumpaidamt,2) as money) <> cast(round(Bpaidamt,2) as money)
Sql Fiddle showing how it would work http://sqlfiddle.com/#!3/4eb79/1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive table showing NULL output for a datetime column - hive

Related

How to split these multiple rows in SQL?

How to use wm_concat one a column that already exists in the query?

PostgreSQL: show trips within a bounding box

PostgreSQL: Get elapsed amount between integers

casting a REAL as INT and comparing

Categories

Resources