I'd like to get access to the results from a sql query that is getting sent over to a redshift database in scala.
val test = spark.sql("""
SELECT * FROM my_table
LIMIT 100""")
test.show
When I check the class it appears to be a sql data set.
test.getClass
res52: Class[_ <: org.apache.spark.sql.DataFrame] = class org.apache.spark.sql.Dataset
Unfortunately, it returns this error.
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange SinglePartition
How do I show the top results from my sql dataset?
Related
I am working with the R programming language.
Normally, when I want to get the summary of a table, I can use something like the "str()" function or the "summary()" function:
str(my_table)
summary(my_table)
However, now I am trying to do this with tables on a server.
For instance, I am trying to get the summaries of variable types for a specific table (e.g. "my_table") on a server. I found a very indirect way to do this:
#load libraries
library(OBDC)
library(RODBC)
library(dbi)
#establish a connection and name it as "dbhandle"
rs <- dbSendQuery(dbhandle, 'select * from my_table limit 1')
dbColumnInfo(rs)
My Question: Is there a more "direct" way to do this? For example, can I get information about each column (e.g. whether the column is integer, character, date, etc.) in a table without first sending the query and then requesting the information? Can I do this directly?
Thanks!
You could try using fetch() from "RMySQL" to turn your SQL query into an R object (e.g. data frame)
library(RMySQL)
rs <- dbSendQuery(dbhandle, 'select * from my_table limit 1')
# Get the results from MySQL into R
my_table = fetch(rs, n=-1)
# clear result
dbClearResult(rs)
rm(rs)
Then use the functions you describe.
str(my_table)
summary(my_table)
I am using boto3 library with executeStatement to get data from an RDS cluster using DATA API.
Query is working fine if i select 1 or 2 columns but as soon as I select another column to query, it returns an error with (BadRequestException) permission denied for relation table_name
I have checked using pgadmin the permissions are intact to query the whole db for the user I am using.
function included in call:
def execute_query(self, sql_query, sql_parameters=[]):
"""
Aurora DataAPI execute query. Generally used for select statements.
:param sql_query: Query
:param sql_parameters: parameters in sql query
:return: DataApi response
"""
client = self.api_access()
response = client.execute_statement(
resourceArn=RESOURCE_ARN,
secretArn=SECRET_ARN,
database='db_name',
sql=sql_query,
includeResultMetadata=True,
parameters=sql_parameters)
return response
function call: No errors
query = '''
SELECT id
FROM schema_name.table_name
limit 1
'''
print(query)
result = conn.execute_query(query)
print(result)
function call: fails with above error
query = '''
SELECT id,name,event
FROM schema_name.table_name
limit 1
'''
print(query)
result = conn.execute_query(query)
print(result)
Is there a horizontal limit on what we can get from DATA API using Boto3? I know there is a limit for 1MB, but it should return something as per the documentation if it exceeds the limit.
Backend is Postgres RDS
UPDATE:
I can select the same columns 10 times and its not a problem
query = '''
SELECT id,event,event,event,event,event
FROM schema_name.table_name
limit 1
'''
print(query)
result = conn.execute_query(query)
print(result)
So this means there are some columns that I cannot select.
I didnt know there are column level security in some tables. If there are column level securities in postgres for the user you are using that's obvious I cannot select those columns.
I have written this except query to get difference in record from both hive tables from databricks notebook.(I am trying to get result as we get in mssql ie only difference in resultset)
select PreqinContactID,PreqinContactName,PreqinPersonTitle,EMail,City
from preqin_7dec.PreqinContact where filename='InvestorContactPD.csv'
except
select CONTACT_ID,NAME,JOB_TITLE,EMAIL,CITY
from preqinct.InvestorContactPD where contact_id in (
select PreqinContactID from preqin_7dec.PreqinContact
where filename='InvestorContactPD.csv')
But the result set returned is also having matching records.The record which i have shown above is coming in result set but when i checked it separately based on contact_id it is same.so I am not sure why except is returning the matching record also.
Just wanted to know how we can use except or any difference finding command in databrick notebook by using sql.
I want to see nothing in result set if source and target data is same.
EXCEPT works perfectly well in Databricks as this simple test will show:
val df = Seq((3445256, "Avinash Singh", "Chief Manager", "asingh#gmail.com", "Mumbai"))
.toDF("contact_id", "name", "job_title", "email", "city")
// Save the dataframe to a temp view
df.createOrReplaceTempView("tmp")
df.show
The SQL test:
%sql
SELECT *
FROM tmp
EXCEPT
SELECT *
FROM tmp;
This query will yield no results. Is it possible you have some leading or trailing spaces for example? Spark is also case-sensitive so that could also be causing your issue. Try a case-insensitive test by applying the LOWER function to all columns, eg
I want to execute an aggregate function MAX on a table's ID column residing in MS SQL. I am using spark SQL 1.6 and JDBC push down_query approach as I don't want spark SQL to pull all the data on spark side and do MAX (ID) calculation, but when I execute below code I get below exception, whereas If I try SELECT * FROM in code it works as expected.
Code:
def getMaxID(sqlContext: SQLContext,tableName:String) =
{
val pushdown_query = s"(SELECT MAX(ID) FROM ${tableName}) as t"
val maxID = sqlContext.read.jdbc(url = getJdbcProp(sqlContext.sparkContext).toString, table = pushdown_query, properties = getDBConnectionProperties(sqlContext.sparkContext))
.head().getLong(0)
maxID
}
Exception:
Exception in thread "main" java.sql.SQLException: No column name was specified for column 1 of 't'.
at net.sourceforge.jtds.jdbc.SQLDiagnostic.addDiagnostic(SQLDiagnostic.java:372)
at net.sourceforge.jtds.jdbc.TdsCore.tdsErrorToken(TdsCore.java:2988)
at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2421)
at net.sourceforge.jtds.jdbc.TdsCore.getMoreResults(TdsCore.java:671)
at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java:505)
at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.executeQuery(JtdsPreparedStatement.java:1029)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:124)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:222)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146)
This exception is not related to Spark. You have to provide an alias for the column
val pushdown_query = s"(SELECT MAX(ID) AS max_id FROM ${tableName}) as t"
I am using SquirrelSQL to write and execute SQL queries on a Netezza database. Using Netezza's spatial capabilities (which are essentially the same as those of PostGIS) I've executed a query and returned a single result that contains a geometry. Here's the query, for reference:
SELECT t.SHAPE
FROM (SELECT * FROM OS_AB_PLUS..E12_ADDRESSBASE WHERE POSTCODE = 'RH1 6NE'
AND PAO_START_NUMBER = '14') as a, OS_TOPO..TOPOGRAPHICAREA as t
WHERE inza..ST_Within(a.shape, t.shape) = TRUE
My issue is that the geometry field, which should contain the polygon coordinates represented as Well-Known Text (WKT), looks instead like this:
g¹ AË Affff¬0AÍÌÌÌî0AÒ 3333Ê AÍÌÌÌî0A» Aë0Afffæ» AffffÒ0A¹ AÒ0A333³¹ A3333¿0AŒ AffffÀ0AÍÌÌLŒ Affff¬0AË A¯0AëQ8Ê A3333í0A3333Ê AÍÌÌÌî0A
I can't seem to find anywhere in SquirrelSQL to specify the encoding of VARCHAR columns, and I've seen the column returned without encoding issues in Aginity (another SQL client). Any suggestions on how to proceed would be much appreciated.
Turns out my issue was not really related to encoding at all. The human-readable version of the geometry in a PostGIS-like database will only be returned when ST_AsText is used in the select statement. So my SQL query becomes:
SELECT inza..ST_AsText(t.SHAPE)
FROM (SELECT * FROM OS_AB_PLUS..E12_ADDRESSBASE WHERE POSTCODE = 'RH1 6NE'
AND PAO_START_NUMBER = '14') as a, OS_TOPO..TOPOGRAPHICAREA as t
WHERE inza..ST_Within(a.shape, t.shape) = TRUE
Which returns. as intended:
POLYGON ((526696.15 148931.9, 526703.94 148932.34, 526703.8 148935.2, 526705.5 148935.3, 526705.4 148937.8, 526695.9 148937.35, 526696.15 148931.9))