WHERE condition on new created Column in Impala - sql

In my Table, I have time information in UNIX time that I have converted to the proper time format using the following function in impala:
cast(ts DIV 1000 as TIMESTAMP) as NewTime.
Now I want to apply WHERE query on the newly created column "NewTime" to select the data from a particular time period but I am getting the following error:
"Could not revolve column/field reference: NewTime".
How can I apply WHERE query on the newly created column in impala.
Thanks.

You can calculate it using inner subquery and then use it for filtering.
select NewTime
from
(select cast(ts DIV 1000 as TIMESTAMP) as NewTime,... from table) subq
where
subq.NewTime >now()
You can also use CTE like Gordon said.

Related

Hive - calculating string type timestamp differences in minutes

I'm novice to SQL (in hive) and trying to calculate every anonymousid's time spent between first event and last event in minutes. The resource table's timestamp is formatted as string,
like: "2020-12-24T09:47:17.775Z". I've tried in two ways:
1- Cast column timestamp to bigint and calculated the difference from main table.
select anonymousid, max(from_unixtime(cast('timestamp' as bigint)) - min(from_unixtime(cast('timestamp' as bigint)) from db1.formevent group by anonymousid
I got NULLs after implementing this as a solution.
2- Create a new table from main resource, put conditions to call with 'where' and tried to convert 'timestamp' to date format without any min-max calculation.
create table db1.successtime as select anonymousid, pagepath,buttontype, itemname, 'location', cast(to_date(from_unixtime(unix_timestamp('timestamp', "yyyy-MM-dd'T'HH:mm:ss.SSS"),'HH:mm:ss') as date) from db1.formevent where pagepath = "/account/sign-up/" and itemname = "Success" and 'location' = "Standard"
Then I got NULLs again and I left. It looks like this
Is there any way I can reformat and calculate time difference in minutes between first and last event ('timestamp') and take the average grouped by 'location'?
select anonymousid,
(max(unix_timestamp(timestamp, "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")) -
min(unix_timestamp(timestamp, "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"))
) / 60
from db1.formevent
group by anonymousid;
From your description, this should work:
select anonymousid,
(max(unix_timestamp(timestamp, 'yyyy-MM-dd'T'HH:mm:ss.SSS'),'HH:mm:ss') -
min(unix_timestamp(timestamp, 'yyyy-MM-dd'T'HH:mm:ss.SSS'),'HH:mm:ss')
) / 60
from db1.formevent
group by anonymousid;
Note that the column name is not in single quotes.

Add date column that based upon other date column SQL BQ

I have a column of dates in my table (referred as org_day).
I try to add a new column that represent the day after, that is
day_after = org_day + day (or 24 hours) (for all rows of org_day)
From what I've read, the DATE_ADD function of SQL does not
work on the entire column, so trying to do something like:
DATE_ADD (org_day, INTERVAL 24 HOUR) or
DATE_ADD (DATE org_day, INTERVAL 24 HOUR)
do not work.
The usual examples that do work look like:
DATE_ADD (DATE '2019-12-22', INTERVAL 1 day),
But I want to perform this operation on the entire column,
not on a constant date.
Appreciate any help.
To update the entire column, you need to set everything on that column. Try this, hope it solved ur problem...
UPDATE table_name SET column_name = DATE_ADD(var, interval);
You can try this:
CREATE OR REPLACE TABLE
mydataset.mytable AS
SELECT
org_day,
DATE_ADD(org_day, INTERVAL 1 day) day_after
FROM
mydataset.mytable;
This above statement will modify the the existing table by adding a new column, without deleting exiting data.
I would suggest using a view:
create view v_t as
select t.*, date_add(org_day, interval 1 day) as day_after
from t;
If you always want the new column to be in synch with existing column, then a view ensures that the data is consistent. The value is calculated when you query the data.

Use DataStudio to specify the date range for a custom query in BigQuery, where the date range influences operators in the query

I currently have a DataStudio dashboard connected to a BigQuery custom query.
That BQ query has a hardcoded date range and the status of one of the columns (New_or_Relicensed) can change dynamically for a row, based on the dates specified in the range. I would like to be able to alter that range from DataStudio.
I have tried:
simply connecting the DS dashboard to the custom query in BQ and then introducing a date range filter, but as you can imagine - that does not work because it's operating on an already hard-coded date range.
reviewing similar answers, but their problem doesn't appear to be quite the same E.g. BigQuery Data Studio Custom Query
Here is the query I have in BQ:
SELECT t0.New_Or_Relicensed, t0.Title_Category FROM (WITH
report_range AS
(
SELECT
TIMESTAMP '2019-06-24 00:00:00' AS start_date,
TIMESTAMP '2019-06-30 00:00:00' AS end_date
)
SELECT
schedules.schedule_entry_id AS Schedule_Entry_ID,
schedules.schedule_entry_starts_at AS Put_Up,
schedules.schedule_entry_ends_at AS Take_Down,
schedule_entries_metadata.contract AS Schedule_Entry_Contract,
schedules.platform_id AS Platform_ID,
platforms.platform_name AS Platform_Name,
titles_metadata.title_id AS Title_ID,
titles_metadata.name AS Title_Name,
titles_metadata.category AS Title_Category,
IF (other_schedules.schedule_entry_id IS NULL, "new", "relicensed") AS New_Or_Relicensed
FROM
report_range, client.schedule_entries AS schedules
JOIN client.schedule_entries_metadata
ON schedule_entries_metadata.schedule_entry_id = schedules.schedule_entry_id
JOIN
client.platforms
ON schedules.platform_id = platforms.platform_id
JOIN
client.titles_metadata
ON schedules.title_id = titles_metadata.title_id
LEFT OUTER JOIN
client.schedule_entries AS other_schedules
ON schedules.platform_id = other_schedules.platform_id
AND other_schedules.schedule_entry_ends_at < report_range.start_date
AND schedules.title_id = other_schedules.title_id
WHERE
((schedules.schedule_entry_starts_at >= report_range.start_date AND
schedules.schedule_entry_starts_at <= report_range.end_date) OR
(schedules.schedule_entry_ends_at >= report_range.start_date AND
schedules.schedule_entry_ends_at <= report_range.end_date))
) AS t0 LIMIT 100;
Essentially - I would like to be able to set the start_date and end_date from google data studio, and have those dates incorporated into the report_range that then influences the operations in the rest of the query (that assign a schedule entry as new or relicensed).
Have you looked at using the Custom Query interface of the BigQuery connector in Data Studio to define start_date and end_date as parameters as part of a filter.
Your query would need a little re-work...
The following example custom query uses the #DS_START_DATE and #DS_END_DATE parameters as part of a filter on the creation date column of a table. The records produced by the query will be limited to the date range selected by the report user, reducing the number of records returned and resulting in a faster query:
Resources:
Introducing BigQuery parameters in Data Studio
https://www.blog.google/products/marketingplatform/analytics/introducing-bigquery-parameters-data-studio/
Running parameterized queries
https://cloud.google.com/bigquery/docs/parameterized-queries
I had a similar issue where I wanted to incorporate a 30 day look back before the start (#ds_start_date). In this case I was using Google Analytics UA session data and using table suffix in my where clause. I was able to calculate a date RELATIVE to the built in data studio "string" dates by using the following:
...
WHERE
_table_suffix BETWEEN
CAST(FORMAT_DATE('%Y%m%d', DATE_SUB (PARSE_DATE('%Y%m%d',#DS_START_DATE), INTERVAL 30 DAY)) AS STRING)
AND
CAST(FORMAT_DATE('%Y%m%d', DATE_SUB (PARSE_DATE('%Y%m%d',#DS_END_DATE), INTERVAL 0 DAY)) AS STRING)

Trying to UNNEST timestamp array field, but need to GROUP BY

I have a repeated field of type TIMESTAMP in a BigQuery table. I am attempting to UNNEST this field. However, I must group or aggregate the field in order. I am not knowledgable with SQL, so I could use some help. The code snippet is part of a larger query that works when substituting subscription.future_renewal_dates with GENERATE_TIMESTAMP_ARRAY
subscription.future_renewal_dates is ARRAY<TIMESTAMP>
The TIMESTAMP array is unique (recurring subscriptions) and cannot be generated using GENERATE_TIMESTAMP_ARRAY, so I have to generate the dates before uploading to BigQuery. UDF is too much.
SELECT
subscription.amount AS subscription_amount,
subscription.status AS subscription_status,
"1" AS analytic_name,
ARRAY (
SELECT
AS STRUCT FORMAT_TIMESTAMP("%x", days) AS type_value, subscription.amount AS analytic_name
FROM
UNNEST(subscription.future_renewal_dates) as days
WHERE
(
days >= TIMESTAMP("2019-06-05T19:30:02+00:00")
AND days <= TIMESTAMP("2019-08-01T03:59:59+00:00")
)
) AS forecast
FROM
`mydataset.subscription` AS subscription
GROUP BY
subscription_amount,
subscription_status,
analytic_name
Cannot figure out how to successfully unnest subscription.future_renewal_dates without error 'UNNEST expression references subscription.future_renewal_dates which is neither grouped nor aggregated'
When you do GROUP BY - all expressions, columns in the SELECT (except those in GROUP BY list) should be used with some aggregation function - which you clearly do not have. So you need to decide what it is that you actually trying to achieve here with that grouping
Below is the option I think you had in mind - though it can be different - but at least you have an idea on how to fix it
SELECT
subscription.amount AS subscription_amount,
subscription.status AS subscription_status,
"1" AS analytic_name,
ARRAY_CONCAT_AGG( ARRAY (
SELECT
AS STRUCT FORMAT_TIMESTAMP("%x", days) AS type_value, subscription.amount AS analytic_name
FROM
UNNEST(subscription.future_renewal_dates) as days
WHERE
(
days >= TIMESTAMP("2019-06-05T19:30:02+00:00")
AND days <= TIMESTAMP("2019-08-01T03:59:59+00:00")
)
)) AS forecast
FROM
`mydataset.subscription` AS subscription
GROUP BY
subscription_amount,
subscription_status,
analytic_name

Access: Compare current field value in subquery

I am trying to create a subquery in MS Access where the having clause compares a value on the current record. I created the queries separate, but am having a hard time trying to combine them.
I have the following query, which is a Purchase Order list (POsFullDetail), and should show the first occurrence of the date of a PO given the Stock number (Stockum):
SELECT POsFullDetail.PO, POsFullDetail.OrderDate, POsFullDetail.StockNum,
(SELECT First(POsFullDetail.OrderDate) AS FirstOfOrderDate
FROM POsFullDetail
GROUP BY POsFullDetail.StockNum
HAVING POsFullDetail.StockNum = POsFullDetail.StockNum.Value
ORDER BY First(POsFullDetail.OrderDate)
) AS First_Date
FROM POsFullDetail;
The statement that I am trying to work with is POsFullDetail.StockNum.Value
The way it is set up, it's asking for a value. When I created the subquery separate I entered the stock number directly.
The subquery gives you the first order date per stocknum.
When using it as a subquery, you are no longer interested in the first order date per stocknum, but in the first order date for the stocknum.
SELECT POsFullDetail.PO, POsFullDetail.OrderDate, POsFullDetail.StockNum,
(
SELECT First(SameStockNum.OrderDate) AS FirstOfOrderDate
FROM POsFullDetail AS SameStockNum
WHERE SameStockNum.StockNum = POsFullDetail.StockNum
) AS First_Date
FROM POsFullDetail;
As you see, you must use a table alias, so you can link the table to itself. Though working with the same table you call it one time POsFullDetail and one time SameStockNum which enables you to link by SameStockNum.StockNum = POsFullDetail.StockNum.