I am trying to do multiple mysql queries to get monthly data from a table, using a for/while loop using a set of dates.
For ex, I have a list of dates as :
a = [
(2022-01-01, 2022-01-31),
(2022-02-01, 2022-02-28),
(2022-03-01, 2022-03-31),
...,
(2022-12-01, 2022-12-31)
]
I would like to go through the dates so that I don't have to query 12 times manually. But I get sql compilation error: invalid identifier A:
for i in range(12):
sql_query = select *
from table
where date between a[i][i] and a[i][i+1];
and pass this query to snowflake db
Related
In the process of converting some SAS code to PySpark and we previously used a macro variable for the where statement in this code. In adapting to PySpark, I'm trying to pass a list of dates to the where statement, but I keep getting errors. I want the SQL code to pull all data from those 3 months. Any pointers?
month_list = ['202107', '202108', '202109']
sql_query = """ (SELECT *
FROM Table_Blah
WHERE (to_char(DateVariable,'yyyymm') IN '{}')
) as table1""".format(month_list)
Pass the list as a tuple to have the right sql syntax:
month_list = ['202107', '202108', '202109']
sql_query = """ (SELECT *
FROM Table_Blah
WHERE (to_char(DateVariable,'yyyymm') IN {})
) as table1""".format(tuple(month_list))
And you don’t need apostrophe for in statement
I have written this except query to get difference in record from both hive tables from databricks notebook.(I am trying to get result as we get in mssql ie only difference in resultset)
select PreqinContactID,PreqinContactName,PreqinPersonTitle,EMail,City
from preqin_7dec.PreqinContact where filename='InvestorContactPD.csv'
except
select CONTACT_ID,NAME,JOB_TITLE,EMAIL,CITY
from preqinct.InvestorContactPD where contact_id in (
select PreqinContactID from preqin_7dec.PreqinContact
where filename='InvestorContactPD.csv')
But the result set returned is also having matching records.The record which i have shown above is coming in result set but when i checked it separately based on contact_id it is same.so I am not sure why except is returning the matching record also.
Just wanted to know how we can use except or any difference finding command in databrick notebook by using sql.
I want to see nothing in result set if source and target data is same.
EXCEPT works perfectly well in Databricks as this simple test will show:
val df = Seq((3445256, "Avinash Singh", "Chief Manager", "asingh#gmail.com", "Mumbai"))
.toDF("contact_id", "name", "job_title", "email", "city")
// Save the dataframe to a temp view
df.createOrReplaceTempView("tmp")
df.show
The SQL test:
%sql
SELECT *
FROM tmp
EXCEPT
SELECT *
FROM tmp;
This query will yield no results. Is it possible you have some leading or trailing spaces for example? Spark is also case-sensitive so that could also be causing your issue. Try a case-insensitive test by applying the LOWER function to all columns, eg
I have the following table schema prepared by AWS glue
When I query the table using SELECT * FROM "vietnam-property-develop"."sell" limit 10;, it throws an error:
HIVE_BAD_DATA: Error parsing field value '{"area":"85
m²","date":"14/01/2020","datetime":"2020-01-18
00:42:28.488576+00:00","address":"Quan Hoa - Cầu Giấy","price":"20
Tỷ","cat":"Bán nhà mặt
phố","lon":"105.7976502","avatar":"","id":"24169794","title":"Chính
chủ cần bán nhà mặt phố nguyễn văn huyên Quan Hoa Cầu Giấy, 2 tầng, dt
85m2. LH 0903233723","lat":"21.0376771","room":"0"}' for field 4:
org.openx.data.jsonserde.json.JSONObject cannot be cast to
java.lang.Double
Then I tired to just query the title column by using SELECT title FROM "vietnam-property-develop"."sell" limit 10;
It returns result which I didn't expect. It seems that the query return the whole json files instead of just the title column. And the number of rows is 4 but not 10 no matter how I modify the query.
I'm querying sql table using pyspark.
If I have a sql table which has two column (value, isDelayed) where "value" is of double type and "isDelayed" has value 0 or 1. How to write a query using pyspark aggregation query which gives sum of "value" when "isDelayed" is 1.
I've already tried below code which is giving an error
def __main__(self, data):
delayedData = data.where(col('isDelayed').cast('int')==='1')
groupByIsDelayed = delayedData.agg(sum(total))
return groupByIsDelayed
I'm getting
"Syntax Error: invalid syntax"
on below line
delayedData = data.where(col('isDelayed').cast('int')==='1')
replace data.where(col('isDelayed').cast('int')==='1') with data.where(col('isDelayed').cast('int') == 1)
2 = only (equal operator in python is 2 = sign)
1 without quote (because you compare a int, not a string)
or
data.where("isDelayed=1")
I am trying to fetch results from my sqlite database by providing a date range.
I have been able to fetch results by providing 3 filters
1. Name (textfield1)
2. From (date)(textfield2)
3. To (date)(textfield3)
I am inserting these field values taken from form into a table temp using following code
Statement statement6 = db.createStatement("INSERT INTO Temp(date,amount_bill,narration) select date,amount,narration from Bills where name=\'"+TextField1.getText()+"\' AND substr(date,7)||substr(date,4,2)||substr(date,1,2) <= substr (\'"+TextField3.getText()+"\',7)||substr (\'"+TextField3.getText()+"\',4,2)||substr (\'"+TextField3.getText()+"\',1,2) AND substr(date,7)||substr(date,4,2)||substr(date,1,2) >= substr (\'"+TextField2.getText()+"\',7)||substr (\'"+TextField2.getText()+"\',4,2)||substr (\'"+TextField2.getText()+"\',1,2) ");
statement6.prepare();
statement6.execute();
statement6.close();
Now if i enter the following input in my form for the above filters
1.Ricky
2.01/02/2012
3.28/02/2012
It fetches date between these date ranges perfectly.
But now i want to insert values that are below and above these 2 date ranges provided.
I have tried using this code.But it doesnt show up any result.I simply cant figure where the error is
The below code is to find entries having date lesser than 01/02/2012 and greater than 28/02/2012.
Statement statementVII = db.createStatement("INSERT INTO Temp5(date,amount_rec,narration) select date,amount,narration from Bills where name=\'"+TextField1.getText()+"\' AND substr(date,7)||substr(date,4,2)||substr(date,1,2) < substr (\'"+TextField2.getText()+"\',7)||substr (\'"+TextField2.getText()+"\',4,2)||substr (\'"+TextField2.getText()+"\',1,2) AND substr(date,7)||substr(date,4,2)||substr(date,1,2) > substr (\'"+TextField3.getText()+"\',7)||substr (\'"+TextField3.getText()+"\',4,2)||substr (\'"+TextField3.getText()+"\',1,2)");
statementVII.prepare();
statementVII.execute();
statementVII.close();
Anyone sound on this,please guide.Thanks.
you need to use an Or clause together with brackets:
WHERE name='....' AND (yourDateField<yourLowerDate OR yourDateField>yourHigherDate)