I am using Databricks and I already have loaded some DataTables.
However, I have a complex SQL query that I want to operate on these data tables, and I wonder if i could avoid translating it in pyspark.
Is that possible?
To give an example:
In SQL:
with table2 as (
select column1, column1
from database.table1
where
start_date <= DATE '2019-03-01' and
end_date >= DATE '2019-03-31' )
In pyspark I would already have table1 loaded but the following does not work because it can not find table1.
query = "(
select column1, column1
from table1
where
start_date <= DATE '2019-03-01' and
end_date >= DATE '2019-03-31' )"
table2 = spark.sql(query)
Thanks
Try giving databasename.tablename instead of tablename in query.
query = "(
select column1, column1
from *database_name.table_name*
where
start_date <= DATE '2019-03-01' and
end_date >= DATE '2019-03-31' )"
If you are using pyspark then it must be
spark.sql(query)
Related
I am trying to sum a column1 (invoice_value) in BQ based on a specific date but I want to avoid the duplicates in Column2 (invoice_no).
So far I can sum the column1, but the total sum I get includes several duplicates in column2 (invoice_no)
SELECT SUM(invoices_value) as INVOICES FROM my_data
WHERE invoice_value IS NOT NULL
AND timestamp >='2021-03-01'
AND timestamp < '2021-03-02'
Help will be greatly appreciated.
You can try following query to remove duplicates records
SELECT SUM(invoices_value) as INVOICES FROM
(SELECET DISTINCT invoices_value, invoice_no, timestamp FROM my_data )
WHERE invoice_value IS NOT NULL
AND CAST(timestamp AS TIMESTAMP) >=TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
AND CAST (timestamp AS TIMESTAMP) < CURRENT_TIMESTAMP()
I have the following query that I am trying to run on Athena.
SELECT observation_date, COUNT(*) AS count
FROM db.table_name
WHERE observation_date > '2017-12-31'
GROUP BY observation_date
However it is producing this error:
SYNTAX_ERROR: line 3:24: '>' cannot be applied to date, varchar(10)
This seems odd to me. Is there an error in my query or is Athena not able to handle greater than operators on date columns?
Thanks!
You need to use a cast to format the date correctly before making this comparison. Try the following:
SELECT observation_date, COUNT(*) AS count
FROM db.table_name
WHERE observation_date > CAST('2017-12-31' AS DATE)
GROUP BY observation_date
Check it out in Fiddler: SQL Fidle
UPDATE 17/07/2019
In order to reflect comments
SELECT observation_date, COUNT(*) AS count
FROM db.table_name
WHERE observation_date > DATE('2017-12-31')
GROUP BY observation_date
You can also use the date function which is a convenient alias for CAST(x AS date):
SELECT *
FROM date_data
WHERE trading_date >= DATE('2018-07-06');
select * from my_schema.my_table_name where date_column = cast('2017-03-29' as DATE) limit 5
I just want to add my little words here, if you have date column with ISO-8601 format, for example: 2022-08-02T01:46:46.963120Z then you can use parse_datetime function.
In my case, the query looks like this:
SELECT * FROM internal_alb_logs
WHERE elb_status_code >= 500 AND parse_datetime(time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z') > parse_datetime('2022-08-01-23:00:00','yyyy-MM-dd-HH:mm:ss')
ORDER BY time DESC
See more other examples here: https://docs.aws.amazon.com/athena/latest/ug/application-load-balancer-logs.html#query-alb-logs-examples
I need some help with a SQL query. I have a table in a SQL server database with three fields. Field1, Field2 and DateField. I want to do a count of how many records at any given time are returned where Field1 matches a set variable, Field2 matches a set variable, and the DateField is on the same day (24 hr day from 12:00am - 23:59pm)
select count(*)
from TableA
where Field1 = 'A'
and Field2 = 'B'
and DateField = TODAY
I need help with the date grouping. Any help would be appreciate.
In a nutshell:
and DateField >= TODAY and DateField < TOMORROW
specifically:
and DateField >= cast(current_timestamp as date) and DateField < cast(dateadd(dd, 1, current_timestamp) as date)
You can also use BETWEEN, but I prefer the exclusive upper bound. If find that approach less prone to mistakes, such as using 11:59pm as the end time and leaving records from 23:59.001 to 23:59.997 without a home, or matching on records that actually fall on midnight tomorrow.
Is this what you want?
select count(*)
from TableA
where Field1 = 'A' and Field2 = 'B' and
DateField = cast(getdate() as date);
Is there other way to rewrite/improve this query, trying to make it with less typo and if possible improve performance:
Select
(Select Sum(value) from table1
where code = 'B2'
and date between DATE '2017-01-01'
and DATE '2017-03-31')
+
(Select Sum(value) from table2
where code = 'B2'
and date between DATE '2017-04-01'
and DATE '2017-04-30')
I also tried with union all but this still is not what I need:
Select Sum(value)
from (Select code, value from table1
Where date between DATE '2017-01-01'
and DATE '2017-03-31')
union all
(Select code, value from table1
Where date between DATE '2017-04-01'
and DATE '2017-04-30')
where code = 'B2'
Thanks
Your first query is fine . . . assuming you have a from dual at the end.
For performance, you want indexes on table1(code, date, value) and table2(code, date, value). Note that the order of the columns in the indexes is important.
If, with typo you mean that you have the criteria code = 'B2' twice in your query, you can move it to your from clause. Anyway, be aware that a subquery can return NULL. Use NVL (or COALESCE) to deal with this.
select
nvl((select sum(value) from table1
where code = x.code and date between date '2017-01-01' and date '2017-03-31'), 0)
+
nvl((select sum(value) from table2
where code = x.code and date between date '2017-04-01' and date '2017-04-30'), 0)
from (select 'B2' as code from dual) x;
I have a column in my database called "begin_date".
I am trying to select records where the begin_date are greater than a specific date. I put
select * from Table_Name
where begin_date >= '1/1/2014'
However, it returns error message "String to date conversion error".
I am not sure how to modify the query to make it work?
Thanks!
Try using ISO (8601) standard date formats:
select *
from Table_Name
where begin_date >= '2014-01-01';
select *
from table
where data >= '01/01/2014'