How to generate executable TPC-DS queries? - sql

I have downloaded the DSGEN tool from the TPC-DS web site and already generated the tables and loaded the data into Oracle XE.
I am using the following command to generate the SQL statements :
dsqgen -input ..\query_templates\templates.lst -directory ..\query_templates -dialect oracle -scale 1
However, No matter how I adjust the command I always get this error message :
ERROR: A query template list must be supplied using the INPUT option
Can anybody help?

Apparently you need to use / rather than - for the flags for the Windows executable:
dsqgen /input ..\query_templates\templates.lst /directory ..\query_templates
/dialect oracle /scale 1

Related

Deploy sql workflow with DBX

I am developing deployment via DBX to Azure Databricks. In this regard I need a data job written in SQL to happen everyday. The job is located in the file data.sql. I know how to do it with a python file. Here I would do the following:
build:
python: "pip"
environments:
default:
workflows:
- name: "workflow-name"
#schedule:
quartz_cron_expression: "0 0 9 * * ?" # every day at 9.00
timezone_id: "Europe"
format: MULTI_TASK #
job_clusters:
- job_cluster_key: "basic-job-cluster"
<<: *base-job-cluster
tasks:
- task_key: "task-name"
job_cluster_key: "basic-job-cluster"
spark_python_task:
python_file: "file://filename.py"
But how can I change it so I can run a SQL job instead? I imagine it is the last two lines of code (spark_python_task: and python_file: "file://filename.py") which needs to be changed.
There are various ways to do that.
(1) One of the most simplest is to add a SQL query in the Databricks SQL lens, and then reference this query via sql_task as described here.
(2) If you want to have a Python project that re-uses SQL statements from a static file, you can add this file to your Python Package and then call it from your package, e.g.:
sql_statement = ... # code to read from the file
spark.sql(sql_statement)
(3) A third option is to use the DBT framework with Databricks. In this case you probably would like to use dbt_task as described here.
I found a simple workaround (although might not be the prettiest) to simply change the data.sql to a python file and run the queries using spark. This way I could use the same spark_python_task.

When trying to get the source of a function in Postgres using psql, what does the error "column p.proisagg does not exist" mean?

Background:
using postgres 11 on RDS, interface is psql on a Centos 7 box; objective is to show the source of certain stored procs / functions so that I can work with them
Problem description : When I attempt to list / show the source of a given stored function using the \df+ command which I understand to be correct for this use based on [official docs here](https://www.postgresql.org/docs/current/app-psql.html], an error is given as shown:
psql=> \df+ schema_foo.proc_bar;
ERROR: column p.proisagg does not exist
LINE 6: WHEN p.proisagg THEN 'agg'
I have no clue how to interpret this; the function in question does not contain the snippet of logic shown in the error, nor the column referenced there p.proisagg. I have verified this by opening the function in vim with \ef.
My guess based on several unrelated github issues that mention this same error for example is that it is in reference to some schema code internal to postgres.
Summary: in short, I can view the source of the functions using \ef, so my work is not impaired from a practical standpoint, however I wish to understand this error and why I'm encountering it with \df+.
I had the same issue and ran these 2 commands to fix it
sed -i "s/NOT pp.proisagg/pp.prokind='f'/g" /usr/share/phpPgAdmin/classes/database/Postgres.php
sed -i "s/NOT p.proisagg/p.prokind='f'/g" /usr/share/phpPgAdmin/classes/database/Postgres.php

Running SQL in Stata

I am trying to load data from SQL server management studio into Stata. How do I get Stata to run the .sql file? I have used the -ado- procedure from another post, but it does not work because my database has a username and password.
Original -ado- code:
program define loadsql
*! Load the output of an SQL file into Stata, version 1.2 (dvmaster#gmail.com)
version 12.1
syntax using/, DSN(string) [CLEAR NOQuote LOWercase SQLshow ALLSTRing DATESTRing]
#delimit;
tempname mysqlfile exec line;
file open `mysqlfile' using `"`using'"', read text;
file read `mysqlfile' `line';
while r(eof)==0 {;
local `exec' `"``exec'' ``line''"';
file read `mysqlfile' `line';
};
file close `mysqlfile';
odbc load, exec(`"``exec''"') dsn(`"`dsn'"') `clear' `noquote' `lowercase' `sqlshow' `allstring' `datestring';
end;
help odbc discusses connect_options for connecting to odbc data sources. Two of which are u(userId) and p(password) which can be added to the original code written by #Dimitriy V. Masterov (see post here).
I believe you should be able to connect using SQL Server authentication by adding the u(string) and p(string) as additional options following syntax in the ado file, and then again down below following
odbc load, exec(`"``exec''"') dsn(`"`dsn'"')
This would also require that you pass these arguments to the program when you call it:
loadsql using "./sqlfile.sql", dsn("mysqlodbcdata") u(userId) p(Password)

Powershell 4.0 - plink and table-like data

I am running PS 4.0 and the following command in interaction with a Veritas Netbackup master server on a Unix host via plink:
PS C:\batch> $testtest = c:\batch\plink blah#blersniggity -pw "blurble" "/usr/openv/netbackup/bin/admincmd/nbpemreq -due -date 01/17/2014" | Format-Table -property Status
As you can see, I attempted a "Format-Table" call at the end of this.
The resulting value of the variable ($testtest) is a string that is laid out exactly like the table in the Unix console, with Status, Job Code, Servername, Policy... all that listed in order. But, it populates the variable in Powershell as just that: a vanilla string.
I want to use this in conjunction with a stored procedure on a SQL box, which would be TONS easier if I could format it into a table. How do I use Powershell to tabulate it exactly how it is extracted from the Unix prompt via Plink?
You'll need to parse it and create PS Objects to be able to use the format-* cmdlets. I do enough of it that I wrote this to help:
http://gallery.technet.microsoft.com/scriptcenter/New-PSObjectFromMatches-87d8ce87
You'll need to be able to isolate the data and write a regex to capture the bits you want.

Generating TPC-DS database for sql server

How do I populate the Transaction Processing Performance Council's TPC-DS database for SQL Server? I have downloaded the TPC-DS tool but there are few tutorials about how to use it.
In case you are using windows, you gotta have visual studio 2005 or later. Unzip dsgen in the folder tools there is dsgen2.sln file, open it using visual studio and build the project, will generate tables for you, I've tried that and I loaded tables manually into sql server
I've just succeeded in generating these queries.
There are some tips may not the best but useful.
cp ${...}/query_templates/* ${...}/tools/
add define _END = ""; to each query.tpl
${...}/tools/dsqgen -INPUT templates.lst -OUTPUT_DIR /home/query99/
Let's describe the base steps:
Before go to the next steps double-check that the required TPC-DS Kit has not been already prepared for your DB
Download TPC-DS Tools
Build Tools as described in 'v2.11.0rc2\tools\How_To_Guide-DS-V2.0.0.docx' (I used VS2015)
Create DB
Take the DB schema described in tpcds.sql and tpcds_ri.sql (they located in 'v2.11.0rc2\tools\'-folder), suit it to your DB if required.
Generate data that be stored to database
# Windows
dsdgen.exe /scale 1 /dir .\tmp /suffix _001.dat
# Linux
dsdgen -scale 1 -dir /tmp -suffix _001.dat
Upload data to DB
# example for ClickHouse
database_name=tpcds
ch_password=12345
for file_fullpath in /tmp/tpc-ds/*.dat; do
filename=$(echo ${file_fullpath##*/})
tablename=$(echo ${filename%_*})
echo " - $(date +"%T"): start processing $file_fullpath (table: $tablename)"
query="INSERT INTO $database_name.$tablename FORMAT CSV"
cat $file_fullpath | clickhouse-client --format_csv_delimiter="|" --query="$query" --password $ch_password
done
Generate queries
# Windows
set tmpl_lst_path="..\query_templates\templates.lst"
set tmpl_dir="..\query_templates"
set dialect_path="..\..\clickhouse-dialect"
set result_dir="..\queries"
set tmpl_name="query1.tpl"
dsqgen /input %tmpl_lst_path% /directory %tmpl_dir% /dialect %dialect_path% /output_dir %result_dir% /scale 1 /verbose y /template %tmpl_name%
# Linux
# see for example https://github.com/pingcap/tidb-bench/blob/master/tpcds/genquery.sh
To fix the error 'Substitution .. is used before being initialized' follow this fix.