How to extract sql code from Knime Nodes? - sql

Is there a way to automatically extract code from nodes and save it in .sql or .txt files?
I'm using mostly Database SQL Executor (legacy) nodes where I have sql queries.
I've found that there is settings.xml file for every node in which I can see code as a value for key="statement", maybe I could use XML Reader and XPath nodes somehow?
I would like to have .sql or .txt file for every node, that file should contain sql code that is pasted in that particular node. It would be great if I could choose a name of that file as name of a node.

The Vernalis community contribution has a DB to Variable node. If you change the input port type to DB Data Port, then one of the output variables will be the SQL query. If you are using the Legacy nodes, then the corresponding Database To Variable (Legacy) node will to the same thing.
Once you have the SQL in the variable, you can use a Variable to Table Row node, and then e.g. the Vernalis Save File Locally node, or if you require further options, String to Binary Objects and Binary Objects to Files nodes will allow that

I've decided to share the idea of my solution to that problem, maybe someone else would like to do something like this in simillar way.
I had to work with RStudio and decided to write a script in Rcpp language (weird version of cpp that allows you to anchor R script in it).
Script has a path to the Knime workflow and iterates through every Node folder in search of "Databse SQL Executor" and "Database Reader" nodes.
Then extracts sql code and name of the node from settings.xml file.
After saving it to the variables it clears node name from signs not allowed in windows file names (like ? : | \ / etc) or stuff that xml added.
Same goes for sql code but instead of clearing xml stuff it changes it to the normal version of a sign (for example it changes %%000010 to \n or &lt to <)
When sql code is cleared and formated it saves the code in a .sql file with the name beeing name of the node.
It works pretty well and quite fast. One annying problem is that rcpp doesn't read UTF-8 signs so I had to clear them out from node names manually so the names are readable and not full of some nonesense.

Related

how to read a tab delimited .txt file and insert into oracle table

I want to read a tab delimited file using PLSQL and insert the file data into a table.
Everyday new file will be generated.
I am not sure if external table will help here because filename will be changed based on date.
Filename: SPRReadResponse_YYYYMMDD.txt
Below is the sample file data.
Option that works on your own PC is to use SQL*Loader. As file name changes every day, you'd use your operating system's batch script (on MS Windows, these are .BAT files) to pass a different name while calling sqlldr (and the control file).
External table requires you to have access to the database server and have (at least) read privilege on its directory which contains those .TXT files. Unless you're a DBA, you'll have to talk to them to provide environment. As of changing file name, you could use alter table ... location which is rather inconvenient.
If you want to have control over it, use UTL_FILE; yes, you still need to have access to that directory on the database server, but - writing a PL/SQL script, you can modify whatever you want, including file name.
Or, a simpler option, first rename input file to SPRReadResponse.txt, then load it and save yourself of all that trouble.

How to watch Changes to SQlite Database and Trigger Shell Script

Note: I believe I may be missing a simple solution to this problem. I'm relatively new to programming. Any advice is appreciated.
The problem: A small team of people (~3-5) want to be able to automate, as far as possible, the filing of downloaded files in appropriate folders. Files will be downloaded into a shared downloads folder. The files in this downloads folder will be sorted into a large shared folder structure according to their file-type, URL the file was downloaded from, and so on and so forth. These files are stored on a shared server, and the actual sorting will be done by some kind of shell script running on the server itself.
Whilst there are some utilities which do this (such as Maid), they don't do everything I want them to do. Maid for example doesn't have a way to get the the download url of a file in Linux. Additionally, it is written in Ruby, which I'd like to avoid.
The biggest stumbling block then is finding a find a way to get the url of the downloaded file that can be passed into the shell script. Originally I thought this could be done via getfattr, which would get a file's extended attributes. Frustratingly however, whilst chromium saves a file's download url as an extended attribute, Firefox doesn't seem to do the same thing. So relying on extended attributes seems to be out of the question.
What Firefox does do however is store download 'metadata' in the places.sqlite file, in two separate tables - moz_annos and moz_places. Inspired by this, I dediced to build a Firefox extension that writes all information about the downloaded file to a SQLite database downloads.sqlite on our server upon the completion of said download. This includes the url, MIME type, etc. of the downloaded file.
The idea is that with this data, the server could run a shell script that does some fine-grained sorting of the downloaded file into our shared file system.
However, I am struggling to find out a stable, reliable, and portable way of 'triggering' the script that will actually move the files, as well as passing information about these files to the script so that it can sort them accordingly.
There are a few ways I thought I could go about this. I'm not sure which method is the most appropriate:
1) Watch Downloads Folder
This method would watch for changes to the shared downloads directory, then use the file name of the downloaded file to query downloads.sqlite, getting the matching row, then finally passing the file's attributes into a bash script that sorts said file.
Difficulties: Finding a way to reliably match the downloaded file with the appropriate record in the database. Files may have the same download name but need to be sorted differently, perhaps, for example, if they were downloaded from a different URL. Additionally, I'd like to get additional attributes like whether the file was downloaded in incognito mode.
2) Create Auxillary 'Helper' File
Upon a file download event, the extension creates a 'helper' text file, which is the name of the file + some marker that contains the additional file attribute:
/Downloads/
mydownload.pdf
mydownload-downloadhelper.txt
The server can then watch for the creation of a .txt file in the downloads directory run the necessary shell script from this.
Difficulties: Whilst this avoids using a SQlite databse, it seems rather ungraceful and hacky, and I can see a multitude of ways in which this method would just break or not work.
3) Watch
SQlite Database
This method writes to the shared SQlite database downloads.sqlite on server. Then, by some method, watch for a new insert of a row into this database. This could either be by watching the sqlite databse for a new INSERT on a table, or have a sqlite trigger on INSERT that runs a bash script, passing on the download information into a shell script.
Difficulties: there doesn't seem to be any easy way to watch an SQlite database for a new row insert, and a trigger within SQlite doesn't seem to be able to launch an external script/program. I've searched high and low for a method of doing either of these two processes, but I'm struggling to find any documented way to do it that I am able to understand.
What I would like is :
Some feedback on which of these methods is appropriate, or if there is a more appropriate method that I am overlooking.
An example of a system/program that does something similar to this.
Many thanks in advance.
It seems to me that you have put "the cart in front of the horse":
Use cron to periodically check for new downloads. Process them on the command line instead of trying to trigger things from inside sqlite3:
a) Here is an approach using your shared sqlite3 database "downloads.sqlite":
Upfront once:
Add a table to your database containing just an integer as record counter and a timeStamp field, e.g., "table_counter":
sqlite3 downloads.sqlite "CREATE TABLE "table_counter" ( "counter" INTEGER PRIMARY KEY NOT NULL, "timestamp" DATETIME DEFAULT (datetime('now','UTC')));" 2>/dev/null
Insert an initial record into this new table setting the "counter" to zero and recording a timeStamp:
sqlite3 downloads.sqlite "INSERT INTO "table_counter" VALUES (0, (SELECT datetime('now','UTC')));" 2>/dev/null
Every so often:
Query the table containing the downloads with a "SELECT COUNT(*)" statement:
sqlite3 downloads.sqlite "SELECT COUNT(*) from table_downloads;" 2>/dev/null
Result e.g., 20
Compare this number to the number stored in the record counter field:
sqlite3 downloads.sqlite "SELECT (counter) from table_counter;" 2>/dev/null
Result e.g., 17
If result from 3) > result from 4), then you have downloaded more files than processed.
If so, query the table containing the downloads with a "SELECT" statement for the oldest not yet processed download, using a "subselect":
sqlite3 downloads.sqlite "SELECT * from table_downloads where rowid = (SELECT (counter+1) from table_counter);" 2>/dev/null
In my example this would SELECT all values for the data record with the rowid of 17+1 = 18;
Do your magic in regards to the downloaded file stored as record #18.
Increase the record counter in the "table_counter", again using a subselect:
sqlite3 downloads.sqlite "UPDATE table_counter SET counter = (SELECT (counter) from table_counter)+1;" 2>/dev/null
Finally, update the timeStamp for the "table_counter":
Why? Shit happens on shared drives... This way you can always check how many download records have been processed and when this has happened last time.
sqlite3 downloads.sqlite "UPDATE table_counter SET timeStamp = datetime('now','UTC');" 2>/dev/null
If you want to have a log of this processing then change the SQL statements in 4) to a "SELECT COUNT(*)" and in 7) to an "INSERT counter" and its subselect to an "(SELECT (counter+1) from table_counter)" respectively ...
Please note: The redirections " 2>/dev/null" at the end of the SQL statements are just to suppress this kind of line issued by newer versions of SQLite3 before showing your query results.
-- Loading resources from /usr/home/bernie/.sqliterc
If you don't like timeStamps based on UTC then use localtime instead:
(datetime('now','localtime'))
Put steps 3) inclusive 8) in a shell-script and use a cron entry to run this query/comparism periodically...
Use the complete /path/to/sqlite3 in this shell-script (just in case running on a shared drive. Someone could be fooling around with paths and could surprise your cron ...)
b) I will give you a simpler answer using awk and some hash like md5 in a separate answer.
So it is easier for future readers and easier for you to "rate" :-)

Dynamically populate external tables location

I'm trying to use oracle external tables to load flat files into a database but I'm having a bit of an issue with the location clause. The files we receive are appended with several pieces of information including the date so I was hoping to use wildcards in the location clause but it doesn't look like I'm able to.
I think I'm right in assuming I'm unable to use wildcards, does anyone have a suggestion on how I can accomplish this without writing large amounts of code per external table?
Current thoughts:
The only way I can think of doing it at the moment is to have a shell watcher script and parameter table. User can specify: input directory, file mask, external table etc. Then when a file is found in the directory, the shell script generates a list of files found with the file mask. For each file found issue a alter table command to change the location on the given external table to that file and launch the rest of the pl/sql associated with that file. This can be repeated for each file found with the file mask. I guess the benefit to this is I could also add the date to the end of the log and bad files after each run.
I'll post the solution I went with in the end which appears to be the only way.
I have a file watcher than looks for files in a given input dir with a certain file mask. The lookup table also includes the name of the external table. I then simply issue an alter table on the external table with the list of new file names.
For me this wasn't much of an issue as I'm already using shell for most of the file watching and file manipulation. Hopefully this saves someone searching for ages for a solution.

Iterating through folder - handling files that don't fit schema

I have a directory containing multiple xlsx files and what I want to do is to insert the data from the files in to a database.
So far I have solved this by using tFileList -> tFileInputExcel -> tPostgresOutput
My problem begins when one of this files doesn't match the defined schema and returns an error resulting on a interruption of a workflow.
What I need to figure out is if it's possible skip that file (moving it to another folder for instance) and continuing iterating the rest of existing files.
If I check the option "Die on error" the process ends and doesn't process the rest of the files.
I would approach this by making your initial input schema on the tFileInputExcel be all strings.
After reading the file I would then validate the schema using a tSchemaComplianceCheck set to "Use another schema for compliance check".
You should be able to then connect a reject link from the tSchemaComplianceCheck to a tFileCopy configured to move the file to a new directory (if you want it to move it then just tick "Remove source file").
Here's a quick example:
With the following set as the other schema for the compliance check (notice how it now checks that id and age are Integers):
And then to move the file:
Your main flow from the tSchemaComplianceCheck can carry on using just strings if you are inserting into a database. You might want to use a tConvertType to change things back to the correct data types after this if you are doing any processing that requires proper data types or you are using your tPostgresOutput component to create the table as well.

Execute service builder generated sql file on postgresql

I would like to execute sql files generated by the service builder, but the problem is that the sql files contains types like: LONG,VARCHAR... etc
Some of these types don't exist on Postgresql (for example LONG is Bigint).
I don't know if there is a simple way to convert sql file's structures to be able to run them on Postgresql?
execute ant build-db on the plugin and you will find sql folder with vary vendor specific scripts.
Daniele is right, using build-db task is obviously correct and is the right way to do it.
But... I remember a similar situation some time ago, I had only liferay-pseudo-sql file and need to create proper DDL. I managed to do this in the following way:
You need to have Liferay running on your desktop (or in the machine where is the source sql file), as this operation requires portal spring context fully wired.
Go to Configuration -> Server Administration -> Script
Change language to groovy
Run the following script:
import com.liferay.portal.kernel.dao.db.DB
import com.liferay.portal.kernel.dao.db.DBFactoryUtil
DB db = DBFactoryUtil.getDB(DB.TYPE_POSTGRESQL)
db.buildSQLFile("/path/to/folder/with/your/sql", "filename")
Where first parameter is obviously the path and the second is filename without .sql extension. The file on disk should have proper extension: must be called filename.sql.
This will produce tables folder next to your filename.sql which will contain single tables-postgresql.sql with your Postgres DDL.
As far as I remember, Service Builder uses the same method to generate database-specific code.