Does Spark SQL provide an API to parse a SQL statement and corresponding DDL and infer data types of the select list? - apache-spark-sql

I'm reviewing Spark SQL for a project and I see all the pieces of the API I need (SQL Parser, Dataset, Encoder, LogicalPlan, etc) however I'm having difficulty tying them together the way I'd like.
Essentially I want the following functionality:
var ddl = parseDDL(RAW_DDL);
var query = parseQuery("SELECT c1, c2 FROM t1 WHERE c2='value'", ddl);
var selectFields = query.getSelectFields();
for(var field: selectFields) {
var name = field.getName();
var type = field.getType(); // <~~~ want this in terms of `t1` from ddl
...
}
The type information for the select list in terms of the DDL is what I'm after.
Ideally I'd like a soup-to-nuts example with Spark SQL, if possible.
UPDATE
To clarify let's say I have an SQL schema file with several CREATE TABLE statements:
File: com/abc/MovieDb.sql
CREATE TABLE Movie (
Title varchar(255),
Year integer,
etc.
);
CREATE TABLE Actor (
FirstName varchar(255),
LastName varchar(255),
etc.
);
etc.
I want to use Spark SQL to parse a number of arbitrary SQL SELECT statements against this schema. Importantly, I want to get type information about the select list of each query in terms of the Movie, Actor, etc. tables and columns in the schema. For example:
SELECT Title, Year FROM Movie WHERE Year > 1990
I want to parse this query against the schema and get the type information for the select list. Again the queries are arbitrary, however the schema is stable so something like:
var parser = createParser(schema);
var query = parser.parseQuery(arbitraryQuery);
var selectedFields = query.getSelectedFields();
for (var field: selectedFields) {
var name = field.getName();
var type = field.getType();
}
Most important is the field.getType() call.
I assumed this would be an easy Yes or No type question, but perhaps my use-case is off the beaten path. Time to dive into it myself...

to get columns information here is what can be done
suppose you have dataframe with columns A,B,C,D in it
val inputDf= Seq(("foo","Bar",0,0.0)).toDf("a","b","c","d")
val newDf = inputDf.select("a","c")
val columnInfo= newDf.dtypes // should give you something like (("a","StringType"),("c","IntegarType"))
again this is not tested code but generally this is how you can get the column names and their types.

Related

sql select query for impala column with map data type

For a table, say details, with the schema as,
Column
Type
name
string
desc
map<int, string>
How do I form a select query - to be run by java program - which expects the result set in this structure?
name
desc
Bob
{1,"home"}
Alice
{2,"office"}
Having in mind limitations in impala with regards to complex types here
The result set of an Impala query always contains all scalar types;
the elements and fields within any complex type queries must be
"unpacked" using join queries.
ie. select * from details; would only return results without the column with map type(complex type).
The closest I've come up with is select name, map_col.key, map_col.value from details, details.desc map_col;. Result set is not in expected format - obviously
.
Thanks in advance.

Create SQLite view as a union with source table name field [duplicate]

I want to write a query that examines all the tables in an SQLite database for a piece of information in order to simplify my post-incident diagnostics (performance doesn't matter).
I was hoping to write a query that uses the sqlite_master table to get a list of tables and then query them, all in one query:
SELECT Name
FROM sqlite_master
WHERE Type = 'table' AND (
SELECT count(*)
FROM Name
WHERE conditions
) > 0;
However when attempting to execute this style of query, I receive an error no such table: Name. Is there an alternate syntax that allows this, or is it simply not supported?
SQLite is designed as an embedded database, i.e., to be used together with a 'real' programming language.
To be able to use such dynamic constructs, you must go outside of SQLite itself:
cursor.execute("SELECT name FROM sqlite_master")
rows = cursor.fetchall()
for row in rows:
sql = "SELECT ... FROM {} WHERE ...".format(row[0])
cursor.execute(sql)

Is it possible to use a returned column value as a table name in an SQLite query?

I want to write a query that examines all the tables in an SQLite database for a piece of information in order to simplify my post-incident diagnostics (performance doesn't matter).
I was hoping to write a query that uses the sqlite_master table to get a list of tables and then query them, all in one query:
SELECT Name
FROM sqlite_master
WHERE Type = 'table' AND (
SELECT count(*)
FROM Name
WHERE conditions
) > 0;
However when attempting to execute this style of query, I receive an error no such table: Name. Is there an alternate syntax that allows this, or is it simply not supported?
SQLite is designed as an embedded database, i.e., to be used together with a 'real' programming language.
To be able to use such dynamic constructs, you must go outside of SQLite itself:
cursor.execute("SELECT name FROM sqlite_master")
rows = cursor.fetchall()
for row in rows:
sql = "SELECT ... FROM {} WHERE ...".format(row[0])
cursor.execute(sql)

Is it possible to look up a table-valued function's return columns in SAP HANA's dictionary views?

I've created a table-valued function in SAP HANA:
CREATE FUNCTION f_tables
RETURNS TABLE (
column_value INTEGER
)
LANGUAGE SQLSCRIPT
AS
BEGIN
RETURN SELECT 1 column_value FROM SYS.DUMMY;
END
Now I'd like to be able to discover the function's table type using the dictionary views. I can run this query here:
select *
from function_parameters
where schema_name = '[xxxxxxxxxx]'
and function_name = 'F_TABLES'
order by function_name, position;
Which will yield something like:
PARAMETER_NAME TABLE_TYPE_SCHEMA TABLE_TYPE_NAME
---------------------------------------------------------------------
_SYS_SS2_RETURN_VAR_ [xxxxxxxxxx] _SYS_SS_TBL_[yyyyyyy]_RET
Unfortunately, I cannot seem to be able to look up that _SYS_SS_TBL_[yyyyyyy]_RET table in SYS.TABLES (and TABLE_COLUMNS), SYS.VIEWS (and VIEW_COLUMNS), SYS.DATA_TYPES, etc. in order to find the definitions of the individual columns.
Note that explicitly named table types created using CREATE TYPE ... do appear in SYS.TABLES...
Is there any way for me to look formally look up a table-valued function's return columns? I'm not interested in parsing the source, obviously.
These kind of tables are internal row-store tables, therefore you can only find your _SYS_SS_TBL_[yyyyyyy]_RET table in SYS.RS_TABLES_. This will give you some basic information, including a column ID (CID). This value is important to find the column information.
For example, if your CID is 100, you can find column information in the RS_COLUMNS_ table with this query:
SELECT * FROM SYS.RS_COLUMNS_ WHERE CID = 100

LinQ to SQL : Update on table data not working

I have a LinQ query which is intended to Update the table concerned.
The code is as follows:
LINQHelperDataContext PersonalDetails = new LINQHelperDataContext();
var PerDetails1 = (from details in PersonalDetails.W_Details_Ts
where details.UserId == userId
select details).First();
PerDetails1.Side = "Bridge";
PerDetails1.TotalBudget = 4000000;
PersonalDetails.SubmitChanges();
However, this change/update does not get reflected in the DB. Also,this does not throw any exception.Please suggest.
Make sure W_Details_Ts has one (or more) member properties marked as primary key. L2S can't generate update or delete statements if it does not know the underlying table's PK member(s).