Postgres Array Issue - sql

I have a table as below and want the output to be loaded the data into another table:
Input Table Data(Tempabc):
ID,COURSE,ENROLL_DT
'12345fgh-2bce-467f',array['BB','TT',''],array['01/07/2007 12:00:00 AM','15/09/2007 12:00:00 AM',''],
'1234rty-863d-4e4f',array['CRKT','HKY',''],array['01/01/2005 12:00:00 AM','01/07/2012 12:00:00 AM','']
Output Data:
ID,COURSE,ENROLL_DT
'12345fgh-2bce-467f',array['BB','TT'],array['01/07/2007','15/09/2007'],
'1234rty-863d-4e4f',array['CRKT','HKY'],array['01/01/2005','01/07/2012']
Can you guys please help. I have used the below query however unable to extract date from the third column. The third column is a varchar column while importing from a file but I want to load it to target table where it is a Date datatype Array column:
SELECT ID,
ARRAY_REMOVE(COURSE,'') AS COURSE,ARRAY_REMOVE(ENROLL_DT,'') AS ENROLL_DT
FROM TEMPABC;
However, I am still unable to extract the date from the ENROLL_DT column. Is there a way to extract the Date. Can someone please suggest?

If you want to remove the blank elements of the arrays and change their data type, you could array_remove, unnest, cast the values and finally group them again with array_agg, e.g.
WITH tempabc (id,course,enroll_dt) AS (
VALUES
('12345fgh-2bce-467f',array['BB','TT',''],array['01/07/2007 12:00:00 AM','15/09/2007 12:00:00 AM','']),
('1234rty-863d-4e4f',array['CRKT','HKY',''],array['01/01/2005 12:00:00 AM','01/07/2012 12:00:00 AM',''])
)
SELECT id, array_agg(course) AS course, array_agg(enroll_dt) AS enroll_dt FROM (
SELECT id,
unnest(array_remove(course,'')) AS course,
unnest(array_remove(enroll_dt,''))::date AS enroll_dt
FROM tempabc) q
GROUP BY id;
id | course | enroll_dt
--------------------+------------+-------------------------
12345fgh-2bce-467f | {BB,TT} | {2007-07-01,2007-09-15}
1234rty-863d-4e4f | {CRKT,HKY} | {2005-01-01,2012-07-01}
If you're aiming to create a record for each array value, just array_remove and unnest, e.g.
WITH tempabc (id,course,enroll_dt) AS (
VALUES
('12345fgh-2bce-467f',array['BB','TT',''],array['01/07/2007 12:00:00 AM','15/09/2007 12:00:00 AM','']),
('1234rty-863d-4e4f',array['CRKT','HKY',''],array['01/01/2005 12:00:00 AM','01/07/2012 12:00:00 AM',''])
)
SELECT id,
unnest(array_remove(course,'')) AS course,
unnest(array_remove(enroll_dt,''))::date AS enroll_dt
FROM tempabc;
id | course | enroll_dt
--------------------+--------+------------
12345fgh-2bce-467f | BB | 2007-07-01
12345fgh-2bce-467f | TT | 2007-09-15
1234rty-863d-4e4f | CRKT | 2005-01-01
1234rty-863d-4e4f | HKY | 2012-07-01
Further reading:
PostgreSQL Array Functions
PostgreSQL type cast :: operator

Related

How to get a value inside of a JSON that is inside a column in a table in Oracle sql?

Suppose that I have a table named agents_timesheet that having a structure like this:
ID | name | health_check_record | date | clock_in | clock_out
---------------------------------------------------------------------------------------------------------
1 | AAA | {"mental":{"stress":"no", "depression":"no"}, | 6-Dec-2021 | 08:25:07 |
| | "physical":{"other_symptoms":"headache", "flu":"no"}} | | |
---------------------------------------------------------------------------------------------------------
2 | BBB | {"mental":{"stress":"no", "depression":"no"}, | 6-Dec-2021 | 08:26:12 |
| | "physical":{"other_symptoms":"no", "flu":"yes"}} | | |
---------------------------------------------------------------------------------------------------------
3 | CCC | {"mental":{"stress":"no", "depression":"severe"}, | 6-Dec-2021 | 08:27:12 |
| | "physical":{"other_symptoms":"cancer", "flu":"yes"}} | | |
Now I need to get all agents having flu at the day. As for getting the flu from a single JSON in Oracle SQL, I can already get it by this SQL statement:
SELECT * FROM JSON_TABLE(
'{"mental":{"stress":"no", "depression":"no"}, "physical":{"fever":"no", "flu":"yes"}}', '$'
COLUMNS (fever VARCHAR(2) PATH '$.physical.flu')
);
As for getting the values from the column health_check_record, I can get it by utilizing the SELECT statement.
But How to get the values of flu in the JSON in the health_check_record of that table?
Additional question
Based on the table, how can I retrieve full list of other_symptoms, then it will get me this kind of output:
ID | name | other_symptoms
-------------------------------
1 | AAA | headache
2 | BBB | no
3 | CCC | cancer
You can use JSON_EXISTS() function.
SELECT *
FROM agents_timesheet
WHERE JSON_EXISTS(health_check_record, '$.physical.flu == "yes"');
There is also "plain old way" without JSON parsing only treting column like a standard VARCHAR one. This way will not work in 100% of cases, but if you have the data in the same way like you described it might be sufficient.
SELECT *
FROM agents_timesheet
WHERE health_check_record LIKE '%"flu":"yes"%';
How to get the values of flu in the JSON in the health_check_record of that table?
From Oracle 12, to get the values you can use JSON_TABLE with a correlated CROSS JOIN to the table:
SELECT a.id,
a.name,
j.*,
a."DATE",
a.clock_in,
a.clock_out
FROM agents_timesheet a
CROSS JOIN JSON_TABLE(
a.health_check_record,
'$'
COLUMNS (
mental_stress VARCHAR2(3) PATH '$.mental.stress',
mental_depression VARCHAR2(3) PATH '$.mental.depression',
physical_fever VARCHAR2(3) PATH '$.physical.fever',
physical_flu VARCHAR2(3) PATH '$.physical.flu'
)
) j
WHERE physical_flu = 'yes';
db<>fiddle here
You can use "dot notation" to access data from a JSON column. Like this:
select "DATE", id, name
from agents_timesheet t
where t.health_check_record.physical.flu = 'yes'
;
DATE ID NAME
----------- --- ----
06-DEC-2021 2 BBB
Note that this approach requires that you use an alias for the table name (so you can use it in accessing the JSON data).
For testing I used the data posted by MT0 on dbfiddle. I am not a big fan of double-quoted column names; use something else for "DATE", such as dt or date_.

Convert timestamp value from string to timestamp hive

I have timestamp value stored as string in my table created in hive, and want to convert it to the timestamp type.
I tried the following code:
select date_value, FROM_UNIXTIME(UNIX_TIMESTAMP(date_value, 'dd-MMM-YY HH.mm.ss')) from sales limit 2;
Original time and result is as following:
Original time result
07-NOV-12 17.07.03 2012-01-01 17:07:03
25-FEB-13 04.26.53 2012-12-30 04:26:53
What's wrong in my script?
yy instead of YY
select date_value
,FROM_UNIXTIME(UNIX_TIMESTAMP(date_value, 'dd-MMM-yy HH.mm.ss')) as ts
from sales
;
+--------------------+---------------------+
| date_value | ts |
+--------------------+---------------------+
| 07-NOV-12 17.07.03 | 2012-11-07 17:07:03 |
| 25-FEB-13 04.26.53 | 2013-02-25 04:26:53 |
+--------------------+---------------------+

Query Max/Min value shows original values

I'm taking the max value and min value of a table composed of Date, Time, and Load. For example:
Date | Time | Temp
-------------------------------------
1/1/2014 | 09:00:00 AM | 100
-------------------------------------
1/1/2014 | 09:01:00 AM | 110
-------------------------------------
1/1/2014 | 09:02:00 AM | 120
-------------------------------------
1/1/2014 | 09:03:00 AM | 111
-------------------------------------
....................And so on
I've tried to just use the functions Min(), Max() but these values output the same data as the original table. See SQL code:
SELECT Table1.Date, Table1.Time, Min(Table1.Temp) AS MinLoad
FROM Table1
GROUP BY Table1.Date, Table1.Time;
I tried using DMin() and DMax() functions but instead of getting a value I got a null of the values. I tried the syntax
DMin("[Temp]", "[Table1]", [Time] Between #09:00# And #15:00#)
I'm fairly new to Access so any help would be appreciated.
Thanks!
Figured it out:
SELECT Date.DateLog, Min(Table1.Data) AS MinOfData
FROM [Date] INNER JOIN Table1 ON Date.DateLog = Table1.Date
GROUP BY Date.DateLog;

SQL: earliest date from set of date fields

I have a series of dates associated with a unique identifier in a table. For example:
1 | 1999-04-01 | 0000-00-00 | 0000-00-00 | 0000-00-00 | 2008-12-01 |
2 | 1999-04-06 | 2000-04-01 | 0000-00-00 | 0000-00-00 | 2010-04-03 |
3 | 1999-01-09 | 0000-00-00 | 0000-00-00 | 0000-00-00 | 2007-09-03 |
4 | 1999-01-01 | 0000-00-00 | 1997-01-01 | 0000-00-00 | 2002-01-04 |
Is there a way, to select the earliest date from the predefined list of DATE fields using a straightforward SQL command?
So the expected output would be:
1 | 1999-04-01
2 | 1999-04-06
3 | 1998-01-09
4 | 1997-01-01
I am guessing this is not possible but I wanted to ask and make sure. My current solution in mind involves putting all the dates in a temporary table and then using that to get the MIN()
thanks
Edit: The problem with using LEAST() as stated is that the new behaviour is to return NULL if any of the columns in NULL. In a series of dates like the dataset in question, any date might be NULL. I would like to obtain the earliest actual date from the set of dates.
SOLUTION: Used a combination of LEAST() and IF() in order to filter out NULL dates.
SELECT LEAST( IF(date1=0,NOW(),date1), IF(date2=0,NOW(),date2), [...] );
Lessons learnt a) COALESCE does not treat '0000-00-00' as a NULL date, b) LEAST will return '0000-00-00' as the smallest value - I would guess this is due to internal integer comparison(?)
select id, least(date_col_a, date_col_b, date_col_c) from table
upd
select id, least (
case when date_col_a = '0000-00-00' then now() + interval 100 year else date_col_a end,
case when date_col_b = '0000-00-00' then now() + interval 100 year else date_col_b end) from table
Actually you can do it like bellow or using a large case structure... or with least(date1, date2, dateN) but with that null could be the minimum value...
select rowid, min(date)
from
( select rowid, date1 from table
union all
select rowid, date2 from table
union all
select rowid, date3 from table
/* and so on */
)
group by rowid;
HTH
select
id,
least(coalesce(date1, '9999-12-31'), ....)
from
table

Is it possible to temporarily duplicate and modify rows on the fly in an SQL SELECT query?

I've just received a new data source for my application which inserts data into a Derby database only when it changes. Normally, missing data is fine - I'm drawing a line chart with the data (value over time), and I'd just draw a line between the two points, extrapolating the expected value at any given point. The problem is that as missing data in this case means "draw a straight line," the graph would be incorrect if I did this.
There are two ways I could fix this: I could create a new class that handles missing data differently (which could be difficult due to the way prefuse, the drawing library I'm using, handles drawing), or I could duplicate the rows, leaving the y value the same while changing the x value in each row. I could do this in the Java that bridges the database and the renderer, or I could modify the SQL.
My question is, given a result set like the one below:
+-------+---------------------+
| value | received |
+-------+---------------------+
| 7 | 2000-01-01 08:00:00 |
| 10 | 2000-01-01 08:00:05 |
| 11 | 2000-01-01 08:00:07 |
| 2 | 2000-01-01 08:00:13 |
| 4 | 2000-01-01 08:00:16 |
+-------+---------------------+
Assuming I query it at 8:00:20, how can I make it look like the following using SQL? Basically, I'm duplicating the row for every second until it's already taken. received is, for all intents and purposes, unique (it's not, but it will be due to the WHERE clause in the query).
+-------+---------------------+
| value | received |
+-------+---------------------+
| 7 | 2000-01-01 08:00:00 |
| 7 | 2000-01-01 08:00:01 |
| 7 | 2000-01-01 08:00:02 |
| 7 | 2000-01-01 08:00:03 |
| 7 | 2000-01-01 08:00:04 |
| 10 | 2000-01-01 08:00:05 |
| 10 | 2000-01-01 08:00:06 |
| 11 | 2000-01-01 08:00:07 |
| 11 | 2000-01-01 08:00:08 |
| 11 | 2000-01-01 08:00:09 |
| 11 | 2000-01-01 08:00:10 |
| 11 | 2000-01-01 08:00:11 |
| 11 | 2000-01-01 08:00:12 |
| 2 | 2000-01-01 08:00:13 |
| 2 | 2000-01-01 08:00:14 |
| 2 | 2000-01-01 08:00:15 |
| 4 | 2000-01-01 08:00:16 |
| 4 | 2000-01-01 08:00:17 |
| 4 | 2000-01-01 08:00:18 |
| 4 | 2000-01-01 08:00:19 |
| 4 | 2000-01-01 08:00:20 |
+-------+---------------------+
Thanks for your help.
Due to the set based nature of SQL, there's no simple way to do this. I have used two solution strategies:
a) use a cycle to go from the initial to end date time and for each step get the value, and insert that into a temp table
b) generate a table (normal or temporary) with the 1 minute increments, adding the base date time to this table you can generate the steps.
Example of approach b) (SQL Server version)
Let's assume we will never query more than 24 hours of data. We create a table intervals that has a dttm field with the minute count for each step. That table must be populated previously.
select dateadd(minute,stepMinutes,'2000-01-01 08:00') received,
(select top 1 value from table where received <=
dateadd(minute,dttm,'2000-01-01 08:00')
order by received desc) value
from intervals
It seems like in this case you really don't need to generate all of these datapoints. Would it be correct to generate the following instead? If it's drawing a straight line, you don't need go generate a data point for each second, just two for each datapoint...one at the current time, one right before the next time. This example subtracts 5 ms from the next time, but you could make it a full second if you need it.
+-------+---------------------+
| value | received |
+-------+---------------------+
| 7 | 2000-01-01 08:00:00 |
| 7 | 2000-01-01 08:00:04 |
| 10 | 2000-01-01 08:00:05 |
| 10 | 2000-01-01 08:00:06 |
| 11 | 2000-01-01 08:00:07 |
| 11 | 2000-01-01 08:00:12 |
| 2 | 2000-01-01 08:00:13 |
| 2 | 2000-01-01 08:00:15 |
| 4 | 2000-01-01 08:00:16 |
| 4 | 2000-01-01 08:00:20D |
+-------+---------------------+
If that's the case, then you can do the following:
SELECT * FROM
(SELECT * from TimeTable as t1
UNION
SELECT t2.value, dateadd(ms, -5, t2.received)
from ( Select t3.value, (select top 1 t4.received
from TimeTable t4
where t4.received > t3.received
order by t4.received asc) as received
from TimeTable t3) as t2
UNION
SELECT top 1 t6.value, GETDATE()
from TimeTable t6
order by t6.received desc
) as t5
where received IS NOT NULL
order by t5.received
The big advantage of this is that it is a set based solution and will be much faster than any iterative approach.
You could just walk a cursor, keep vars for the last value & time returned, and if the current one is more than a second ahead, loop one second at a time using the previous value and the new time until you get the the current row's time.
Trying to do this in SQL would be painful, and if you went and created the missing data, you would possible have to add a column to track real / interpolated data points.
Better would be to have a table for each axial value you want to have on the graph, and then either join to it or even just put the data field there and update that record when/if values arrive.
The "missing values" problem is quite extensive, so I suggest you have a solid policy.
One thing that will happen is that you will have multiple adjacent slots with missing values.
This would be much easier if you could transform it into OLAP data.
Create a simple table that has all the minutes (warning, will run for a while):
Create Table Minutes(Value DateTime Not Null)
Go
Declare #D DateTime
Set #D = '1/1/2000'
While (Year(#D) < 2002)
Begin
Insert Into Minutes(Value) Values(#D)
Set #D = DateAdd(Minute, 1, #D)
End
Go
Create Clustered Index IX_Minutes On Minutes(Value)
Go
You can then use it somewhat like this:
Select
Received = Minutes.Value,
Value = (Select Top 1 Data.Value
From Data
Where Data.Received <= Minutes.Received
Order By Data.Received Desc)
From
Minutes
Where
Minutes.Value Between #Start And #End
I would recommend against solving this in SQL/the database due to the set based nature of it.
Also you are dealing with seconds here so I guess you could end up with a lot of rows, with the same repeated data, that would have to be transfered from the database to you application.
One way to handle this is to left join your data against a table that contains all of the received values. Then, when there is no value for that row, you calculate what the projected value should be based on the previous and next actual values you have.
You didn't say what database platform you are using. In SQL Server, I would create a User Defined Function that accepts a start datetime and end datetime value. It would return a table value with all of the received values you need.
I have simulated it below, which runs in SQL Server. The subselect aliased r is what would actually get returned by the user defined function.
select r.received,
isnull(d.value,(select top 1 data.value from data where data.received < r.received order by data.received desc)) as x
from (
select cast('2000-01-01 08:00:00' as datetime) received
union all
select cast('2000-01-01 08:00:01' as datetime)
union all
select cast('2000-01-01 08:00:02' as datetime)
union all
select cast('2000-01-01 08:00:03' as datetime)
union all
select cast('2000-01-01 08:00:04' as datetime)
union all
select cast('2000-01-01 08:00:05' as datetime)
union all
select cast('2000-01-01 08:00:06' as datetime)
union all
select cast('2000-01-01 08:00:07' as datetime)
union all
select cast('2000-01-01 08:00:08' as datetime)
union all
select cast('2000-01-01 08:00:09' as datetime)
union all
select cast('2000-01-01 08:00:10' as datetime)
union all
select cast('2000-01-01 08:00:11' as datetime)
union all
select cast('2000-01-01 08:00:12' as datetime)
union all
select cast('2000-01-01 08:00:13' as datetime)
union all
select cast('2000-01-01 08:00:14' as datetime)
union all
select cast('2000-01-01 08:00:15' as datetime)
union all
select cast('2000-01-01 08:00:16' as datetime)
union all
select cast('2000-01-01 08:00:17' as datetime)
union all
select cast('2000-01-01 08:00:18' as datetime)
union all
select cast('2000-01-01 08:00:19' as datetime)
union all
select cast('2000-01-01 08:00:20' as datetime)
) r
left outer join Data d on r.received = d.received
If you were in SQL Server, then this would be a good start. I am not sure how close Apache's Derby is to sql.
Usage: EXEC ElaboratedData '2000-01-01 08:00:00','2000-01-01 08:00:20'
CREATE PROCEDURE [dbo].[ElaboratedData]
#StartDate DATETIME,
#EndDate DATETIME
AS
--if not a valid interval, just quit
IF #EndDate<=#StartDate BEGIN
SELECT 0;
RETURN;
END;
/*
Store the value of 1 second locally, for readability
--*/
DECLARE #OneSecond FLOAT;
SET #OneSecond = (1.00000000/86400.00000000);
/*
create a temp table w/the same structure as the real table.
--*/
CREATE TABLE #SecondIntervals(TSTAMP DATETIME, DATAPT INT);
/*
For each second in the interval, check to see if we have a known value.
If we do, then use that. If not, make one up.
--*/
DECLARE #CurrentSecond DATETIME;
SET #CurrentSecond = #StartDate;
WHILE #CurrentSecond <= #EndDate BEGIN
DECLARE #KnownValue INT;
SELECT #KnownValue=DATAPT
FROM TESTME
WHERE TSTAMP = #CurrentSecond;
IF (0 = ISNULL(#KnownValue,0)) BEGIN
--ok, we have to make up a fake value
DECLARE #MadeUpValue INT;
/*
*******Put whatever logic you want to make up a fake value here
--*/
SET #MadeUpValue = 99;
INSERT INTO #SecondIntervals(
TSTAMP
,DATAPT
)
VALUES(
#CurrentSecond
,#MadeUpValue
);
END; --if we had to make up a value
SET #CurrentSecond = #CurrentSecond + #OneSecond;
END; --while looking thru our values
--finally, return our generated values + real values
SELECT TSTAMP, DATAPT FROM #SecondIntervals
UNION ALL
SELECT TSTAMP, DATAPT FROM TESTME
ORDER BY TSTAMP;
GO
As just an idea, you might want to check out Anthony Mollinaro's SQL Cookbook, chapter 9. He has a recipe, "Filling in Missing Dates" (check out pages 278-281), that discusses primarily what you are trying to do. It requires some sort of sequential handling, either via a helper table or doing the query recursively. While he doesn't have examples for Derby directly, I suspect you could probably adapt them to your problem (particularly the PostgreSQL or MySQL one, it seems somewhat platform agnostic).