Impala - Convert MON-YY to YYYYMM - hive

I have one column Month_year in staging table having below data. Please suggest query to get the desired output.
Input:
+----------+
month_year
+----------+
Jan-19
Dec-18
+----------+
Expected Output:
+----------+
month_year
+----------+
201901
201812
+----------+
Thanks in Advance!

If column's month_year date format is MMM-yy then contact 01 as default date for avoid to null result, however it will not effect expected out because you are going to get output in format yyyyMM.
Try this:
select from_unixtime(unix_timestamp(concat('01-',month_year),'dd-MMM-yy'),'yyyyMM')

Related

Postgres Array Issue

I have a table as below and want the output to be loaded the data into another table:
Input Table Data(Tempabc):
ID,COURSE,ENROLL_DT
'12345fgh-2bce-467f',array['BB','TT',''],array['01/07/2007 12:00:00 AM','15/09/2007 12:00:00 AM',''],
'1234rty-863d-4e4f',array['CRKT','HKY',''],array['01/01/2005 12:00:00 AM','01/07/2012 12:00:00 AM','']
Output Data:
ID,COURSE,ENROLL_DT
'12345fgh-2bce-467f',array['BB','TT'],array['01/07/2007','15/09/2007'],
'1234rty-863d-4e4f',array['CRKT','HKY'],array['01/01/2005','01/07/2012']
Can you guys please help. I have used the below query however unable to extract date from the third column. The third column is a varchar column while importing from a file but I want to load it to target table where it is a Date datatype Array column:
SELECT ID,
ARRAY_REMOVE(COURSE,'') AS COURSE,ARRAY_REMOVE(ENROLL_DT,'') AS ENROLL_DT
FROM TEMPABC;
However, I am still unable to extract the date from the ENROLL_DT column. Is there a way to extract the Date. Can someone please suggest?
If you want to remove the blank elements of the arrays and change their data type, you could array_remove, unnest, cast the values and finally group them again with array_agg, e.g.
WITH tempabc (id,course,enroll_dt) AS (
VALUES
('12345fgh-2bce-467f',array['BB','TT',''],array['01/07/2007 12:00:00 AM','15/09/2007 12:00:00 AM','']),
('1234rty-863d-4e4f',array['CRKT','HKY',''],array['01/01/2005 12:00:00 AM','01/07/2012 12:00:00 AM',''])
)
SELECT id, array_agg(course) AS course, array_agg(enroll_dt) AS enroll_dt FROM (
SELECT id,
unnest(array_remove(course,'')) AS course,
unnest(array_remove(enroll_dt,''))::date AS enroll_dt
FROM tempabc) q
GROUP BY id;
id | course | enroll_dt
--------------------+------------+-------------------------
12345fgh-2bce-467f | {BB,TT} | {2007-07-01,2007-09-15}
1234rty-863d-4e4f | {CRKT,HKY} | {2005-01-01,2012-07-01}
If you're aiming to create a record for each array value, just array_remove and unnest, e.g.
WITH tempabc (id,course,enroll_dt) AS (
VALUES
('12345fgh-2bce-467f',array['BB','TT',''],array['01/07/2007 12:00:00 AM','15/09/2007 12:00:00 AM','']),
('1234rty-863d-4e4f',array['CRKT','HKY',''],array['01/01/2005 12:00:00 AM','01/07/2012 12:00:00 AM',''])
)
SELECT id,
unnest(array_remove(course,'')) AS course,
unnest(array_remove(enroll_dt,''))::date AS enroll_dt
FROM tempabc;
id | course | enroll_dt
--------------------+--------+------------
12345fgh-2bce-467f | BB | 2007-07-01
12345fgh-2bce-467f | TT | 2007-09-15
1234rty-863d-4e4f | CRKT | 2005-01-01
1234rty-863d-4e4f | HKY | 2012-07-01
Further reading:
PostgreSQL Array Functions
PostgreSQL type cast :: operator

How to select unique sessions per unique dates with SQL?

I'm struggling with my SQL. I want to select all unique sessions on unique dates from a table. I don't get the results I want.
Example of table:
session_id | date
87654321 | 2020-05-22 09:10:10
12345678 | 2020-05-23 10:19:50
12345678 | 2020-05-23 10:20:23
87654321 | 2020-05-23 12:00:10
This is my SQL right now. I select all distinct dates from a datetime column. I also count all distinct session_id's. I group them by date.
SELECT DISTINCT DATE_FORMAT(`date`, '%d-%m-%Y') as 'date', COUNT(DISTINCT `session_id`) as 'count' FROM `logging` GROUP BY 'date'
What I want to see is (with example above):
date | count
22-05-2020 | 1
23-05-2020 | 2
The result I get with my real table (with 354 sessions on 3 different dates) right now is:
date | count
21-05-2020 | 200
Edit
Changes ` to '.
The name of the field and the name of the alias is the same (date). Please try to use different name for the alias to avoid confusion in GROUP BY part
You probably want to group on your date expression
SELECT DATE_FORMAT(`date`, '%d-%m-%Y') as `date`, COUNT(DISTINCT `session_id`) as `count` FROM `logging` GROUP BY DATE_FORMAT(`date`, '%d-%m-%Y')

Oracle SQL condition in range of dates

I need a Oracle SQL query to get all the rows that respect this condition:
I have a table in which there are products with a start date of validity and an end date of validity. In input I have a range of date (ex. 20170530 and 20170630). I would get all the products that are valid in the given range. Thank you
Edit:
You are right, I try to be more clear with an example.
I have a table PRODUCTS in which I have two fields: START_DATE and END_DATE (yyyymmdd)
PRODUCTS
----------------------------
|id | start_date | end_date |
----------------------------
|1 | 20170101 | 20171230 |
|2 | 20170501 | 20170705 |
|3 | 20170101 | 20170501 |
|4 | 20170601 | 20170620 |
|5 | 20171010 | 20171110 |
|6 | 20170110 | 20170610 |
I would to extract all the products that are valid in the range 20170530-20170630. It means that the validity of the product must be in the given range 20170530-20170630.
So, from the table above, i will extract products with id
1
2
4
6
Thank you
** SOLVED Edit 2 **
Ok, what I wanted is to get rows in which the dates overlap the input range of data given as parameter. To do so, there is a simple query:
(StartDate1 <= EndDate2) and (StartDate2 <= EndDate1)
Your question is not clear, but here is my interpretation of it. You have a table such as this:
Figure 1: My Product Table
If you want all products that are valid for the range: 09/07/2017 to 11/07/2017 then you would expect ITEM 1 and ITEM 2 to be returned. The SQL Query would look something like this:
SELECT *
FROM MY_PRODUCT_TABLE
WHERE MY_START_DATE BETWEEN START_DATE AND END_DATE
AND MY_END_DATE BETWEEN START_DATE AND END_DATE
Remember the BETWEEN function is inclusive, meaning it takes values between the START_DATE and END_DATE into consideration as well.
Note: If you are using string variables as input, it would be wise to use the TO_DATE function (i.e. TO_DATE (MY_START_DATE, ‘DD.MM.YYYY’) etc. depending on format entered.

sum time with specific delimiter

Right now I have a problem with sum time based on specific condition. For example, I have something like this.
Due to some reason, I have to add the work time based on their activity date if only approval status on the activity date is approve.
So for the restriction example I have something like this
-----------------------------------------------
| Activity Date | ApprovalStatus | WorkTime |
-----------------------------------------------
| 2017-01-06 | Rejected | 01:00:00 |
-----------------------------------------------
| 2017-01-06 | Approve | 03:00:00 |
-----------------------------------------------
| 2017-01-06 | Waiting | 02:00:00 |
-----------------------------------------------
| 2017-01-06 | Approve | 01:00:00 |
-----------------------------------------------
From those example, the accepted worktime that only will be summed from this circumstances, So the expected result is become like below. The expected result is become 04:00:00 since only the approve counted for final result.
-----------------------------------------------
| Activity Date | ApprovalStatus | WorkTime |
-----------------------------------------------
| 2017-01-06 | Approved | 04:00:00 |
-----------------------------------------------
Is there any enlightenment to solve this problem?
PS: I am using SQL Server 2014. Hope you can help me, thank you!!
Try like below
Schema:
SELECT * INTO #TAB FROM(
SELECT '2017-01-06' AS Activity_Date
, 'Rejected' AS ApprovalStatus
, '01:00:00' AS WorkTime
UNION ALL
SELECT '2017-01-06' , 'Approve' , '03:00:00'
UNION ALL
SELECT '2017-01-06' , 'Waiting' , '02:00:00'
UNION ALL
SELECT '2017-01-06' , 'Approve' , '01:00:00'
)A
Now Sum the Hours column by grouping the Date
SELECT [Activity_Date]
,CAST(DATEADD(HH,SUM( DATEDIFF(HH,'00:00:00',WorkTime)),'00:00:00') AS TIME(0))
FROM #TAB
WHERE ApprovalStatus='Approve'
GROUP BY [Activity_Date]
Result:
+---------------+------------------+
| Activity_Date | (No column name) |
+---------------+------------------+
| 2017-01-06 | 04:00:00 |
+---------------+------------------+
UPDATE :
The SUM function will only take exact numeric or approximate numeric data type . It won't accept date or Time datatype for summation.
It is documented in SUM (Transact-SQL) on microsoft website.
SUM ( [ ALL | DISTINCT ] expression )
expression
Is a constant, column, or function, and any combination of
arithmetic, bitwise, and string operators. expression is an expression
of the exact numeric or approximate numeric data type category, except
for the bit data type. Aggregate functions and subqueries are not
permitted.
So you can only have a chance to write your own logic to get the sum of Time. This below will calculate the SUM of time upto milliseconds.
SELECT [Activity_Date]
,CAST(DATEADD(ms, SUM(DATEDIFF(ms, '00:00:00.000', WorkTime)), '00:00:00.000') as time(0))
FROM #TAB2
WHERE ApprovalStatus='Approve'
GROUP BY [Activity_Date]
You can filter the records by ApprovalStatus and do a summation on worktime by grouping it by activity date.
Use this, if you want to add only the hour part.
SELECT SUM(DATEDIFF(HH,'00:00:00',WorkTime)) AS [TotalWorktime]
FROM [YourTable]
WHERE ApprovalStatus = 'Approve'
GROUP BY [Activity Date]
OR
Use this if you want to add even the minutes part.
SELECT SUM(DATEDIFF(MINUTE,'0:00:00',CONVERT(TIME,WorkTime)))/60 + (SUM(DATEDIFF(MINUTE,'0:00:00',CONVERT(TIME,WorkTime)))%60)/100.0 AS [TotalWorktime]
FROM [YourTable]
WHERE ApprovalStatus = 'Approve'
GROUP BY [Activity Date]

MySQL: daily average value

I have a table with a 'timestamp' column and a 'value' column where the values are roughly 3 seconds apart.
I'm trying to return a table that has daily average values.
So, something like this is what i'm looking for.
| timestamp | average |
| 2010-06-02 | 456.6 |
| 2010-06-03 | 589.4 |
| 2010-06-04 | 268.5 |
etc...
Any help on this would be greatly appreciated.
SELECT DATE(timestamp), AVG(value)
FROM table
GROUP BY DATE(timestamp)
Since you want the day instead of each timestamp
select DATE(timestamp), AVG(value)
from TABLE
group by DATE(timestamp)
This assumes that your timestamp column only contains information about the day, but not the time. That way, the dates can be grouped together:
select timestamp, AVG(value) as average
from TABLE_NAME
group by timestamp