JSR 352 : How do you write to a MVS Dataset from a Java Batch program? - file-io

I need to write to a non-VSAM dataset in the mainframe. I know that we need to use the ZFile library to do it and I found how to do it here
I am running my Java batch job in the WebSphere Liberty on zOS. How do I specify the dataset? Can I directly give the DataSet a name like this?
dsnFile = new ZFile("X.Y.Z", "wb,type=record,noseek");
I am able to write it to a text file on the server itself using Java's File Writers but I don't know how to access a mvs dataset.
I am relatively new to the world of zOS and mainframe.

It sounds like you might be asking more generally how to use the ZFile API on WebSphere Liberty on z/OS.
Have you tried something like:
String pdsName = ZFile.getSlashSlashQuotedDSN("X.Y.Z");
ZFile zfile = new ZFile(pdsName , ...options...)
As far as batch-specific use cases, you might obviously have to differentiate between writing to a new file that's created for the first time on an original execution, as opposed to appending to an already-existing one on a restart.
You also might find some useful snipopets in this doctorbatch.io repo, along with the original link you posted.
For reference, I'll copy/paste from the ZFile Javadoc:
ZFile dd = new ZFile("//DD:MYDD", "r");
Opens the DD namee MYDD for reading
ZFile dsn = new ZFile("//'SYS1.HELP(ACCOUNT)'", "rt");
Opens the member ACCOUNT from the PDS SYS1.HELP for reading text records
ZFile dsn = new ZFile("//SEQ", "wb,type=record,recfm=fb,lrecl=80,noseek");
Opens the data set {MVS_USER}.SEQ for sequential binary writing. Note that ",noseek" should be specified with "type=record" if access is sequential, since performance is greatly improved.
One final note, another couple useful ZFile helper methods are: bpxwdyn() and getFullyQualifiedDSN().

Related

Can I use a .txt file as a user database in Telegram? I use Telethon

So, I created a minigame bot on telegram. The bot just contains a fishing game, and it's already running. I want if a user fishes and gets a fish, the fish will be stored in a database. So the user can see what he got while fishing. Does this is require SQL?
I haven't tried anything, because I don't understand about storing data in python. If there is a tutorial related to this, please share it in the comments. Thank you
You can use anything to store user data, including text files.
The simplest approaches to storing data can be serializing a dictionary to JSON with the builtin json module:
DATABASE = 'database.json' # (name or extension don't actually matter)
import json
# loading
with open(DATABASE, 'r', encoding='utf-8') as fd:
user_data = json.load(fd)
user_data[1234] = 5 # pretend user 1234 scored 5 points
# saving
with open(DATABASE, 'w', encoding='utf-8') as fd:
json.dump(user_data, fd)
This would only support simple data-types. If you need to store custom classes, as long as you don't upgrade your Python version, you can use the built-in pickle module:
DATABASE = 'database.pickle' # (name or extension don't actually matter)
import pickle
# loading
with open(DATABASE, 'rb') as fd:
user_data = pickle.load(fd)
user_data[1234] = 5 # pretend user 1234 scored 5 points
# saving
with open(DATABASE, 'wb') as fd:
pickle.dump(user_data, fd)
Whether this is a good idea or not depends on how many users you expect your bot to have. If it's even a hundred, these approaches will work just fine. If it's in the thousands, perhaps you could use separate files per user, and still be okay. If it's more than that, then yes, using any database, including the built-in sqlite3 module, would be a better idea. There are many modules for different database engines, but using SQLite is often enough (and there are also libraries that make using SQLite easier).
Telethon itself uses the sqlite3 module to store the authorization key, and a cache for users it has seen. It's not recommended to reuse that same file for your own needs though. It's better to create your own database file if you choose to use sqlite3.
Using a txt file as database is a terrible idea, go with SQL

Fetching big SQL table in the web app session

I'm quite new in web app so apologize if my question is abit basic. I'm developing a Web app with R shiny where the inputs are very large tables from Azure SQL server. They are 20 tables each in the order of hundred-thousand rows and hundreds of columns containing numbers, Characters and etc. I have no problem calling them, my main issue is that it takes so much time to fetch everything from Azure SQL server. It takes approximately 20 minutes. So the user of the web app needs to wait quite a long.
I'm using DBI package as follows:
db_connect <- function(database_config_name){
dbConfig <- config::get(database_config_name)
connection <- DBI::dbConnect(odbc::odbc(),
Driver = dbConfig$driver,
Server = dbConfig$server,
UID = dbConfig$uid,
PWD = dbConfig$pwd,
Database = dbConfig$database,
encoding = "latin1"
)
return(connection)
}
and then fetching tables by :
connection <- db_connect(db_config_name)
table <- dplyr::tbl(con, dbplyr::in_schema(fetch_schema_name(db_config_name,table_name,data_source_type), fetch_table_name(db_config_name,table_name,data_source_type)))
I searched a lot but didn't come across a good solution, I appreciate any solutions can tackle this problem.
I work with R accessing SQL Server (not Azure) daily. For larger data (as in your example), I always revert to using the command-line tool sqlcmd, it is significantly faster. The only pain point for me was learning the arguments and working around the fact that it does not return proper CSV, there is post-query munging required. You may have an additional pain-point of having to adjust my example to connect to your Azure instance (I do not have an account).
In order to use this in a shiny environment and preserve its interactivity, I use the processx package to start the process in the background and then poll its exit status periodically to determine when it has completed.
Up front: this is mostly a "loose guide", I do not pretend that this is a fully-functional solution for you. There might be some rough-edges that you need to work through yourself. For instance, while I say you can do it asynchronously, it is up to you to work the polling process and delayed-data-availability into your shiny application. My answer here provides starting the process and reading the file once complete. And finally, if encoding= is an issue for you, I don't know if sqlcmd does non-latin correctly, and I don't know if or how to fix this with its very limited, even antiquated arguments.
Steps:
Save the query into a text file. Short queries can be provided on the command-line, but past some point (128 chars? I don't know that it's clearly defined, and have not looked enough recently) it just fails. Using a query-file is simple enough and always works, so I always use it.
I always use temporary files for each query instead of hard-coding the filename; this just makes sense. For convenience (for me), I use the same tempfile base name and append .sql for the query and .csv for the returned data, that way it's much easier to match query-to-data in the temp files. It's a convention I use, nothing more.
tf <- tempfile()
# using the same tempfile base name for both the query and csv-output temp files
querytf <- paste0(tf, ".sql")
writeLines(query, querytf)
csvtf <- paste0(tf, ".csv")
# these may be useful in troubleshoot, but not always [^2]
stdouttf <- paste0(tf, ".stdout")
stderrtf <- paste0(tf, ".stderr")
Make the call. I suggest you see how fast this is in a synchronous way first to see if you need to add an async query and polling in your shiny interface.
exe <- "/path/to/sqlcmd" # or "sqlcmd.exe"
args <- c("-W", "b", "-s", "\037", "-i", querytf, "-o", csvtf,
"-S", dbConfig$server, "-d", dbConfig$database,
"-U", dbConfig$uid, "-P", dbConfig$pwd)
## as to why I use "\037", see [^1]
## note that the user id and password will be visible on the shiny server
## via a `ps -fax` command-line call
proc <- processx::process$new(command = exe, args = args,
stdout = stdouttf, stderr = stderrtf) # other args exist
# this should return immediately, and should be TRUE until
# data retrieval is done (or error)
proc$is_alive()
# this will hang (pause R) until retrieval is complete; if/when you
# shift to asynchronous queries, do not do this
proc$wait()
One can use processx::run instead of process$new and proc$wait(), but I thought I'd start you down this path in case you want/need to go asynchronous.
If you go with an asynchronous operation, then periodically check (perhaps every 3 or 10 seconds) proc$is_alive(). Once that returns FALSE, you can start processing the file. During this time, shiny will continue to operate normally. (If you do not go async and therefore choose to proc$wait(), then shiny will hang until the query is complete.)
If you make a mistake and do not proc$wait() and try to continue with reading the file, that's a mistake. The file may not exist, in which case it will err with No such file or directory. The file may exist, perhaps empty. It may exist and have incomplete data. So really, make a firm decision to stay synchronous and therefore call proc$wait(), or go asynchronous and poll periodically until proc$is_alive() returns FALSE.
Reading in the file. There are three "joys" of using sqlcmd that require special handling of the file.
It does not do embedded quotes consistently, which is why I chose to use "\037" as a separator. (See [^1].)
It adds a line of dashes under the column names, which will corrupt the auto-classing of data when R reads in the data. For this, we do a two-step read of the file.
Nulls in the database are the literal NULL string in the data. For this, we update the na.strings= argument when reading the file.
exitstat <- proc$get_exit_status()
if (exitstat == 0) {
## read #1: get the column headers
tmp1 <- read.csv(csvtf, nrows = 2, sep = "\037", header = FALSE)
colnms <- unlist(tmp1[1,], use.names = FALSE)
## read #2: read the rest of the data
out <- read.csv(csvtf, skip = 2, header = FALSE, sep = "\037",
na.strings = c("NA", "NULL"), quote = "")
colnames(out) <- colnms
} else {
# you should check both stdout and stderr files, see [^2]
stop("'sqlcmd' exit status: ", exitstat)
}
Note:
After a lot of pain with several issues (some in sqlcmd.exe, some in data.table::fread and other readers, all dealing with CSV-format non-compliance), at one point I chose to stop working with comma-delimited returns, instead opting for the "\037" field Delimiter. It works fine with all CSV-reading tools and has fixed so many problems (some not mentioned here). If you're not concerned, feel free to change the args to "-s", "," (adjusting the read as well).
sqlcmd seems to use stdout or stderr in different ways when there are problems. I'm sure there's rationale somewhere, but the point is that if there is a problem, check both files.
I added the use of both stdout= and stderr= because of a lot of troubleshooting I did, and continue to do if I munge a query. Using them is not strictly required, but you might be throwing caution to the wind if you omit those options.
By the way, if you choose to only use sqlcmd for all of your queries, there is no need to create a connection object in R. That is, db_connect may not be necessary. In my use, I tend to use "real" R DBI connections for known-small queries and the bulk sqlcmd for anything above around 10K rows. There is a tradeoff; I have not measured it sufficiently in my environment to know where the tipping point is, and it is likely different in your case.

Wait.on(signals) use in Apache Beam

Is it possible to write to 2nd BigQuery table after writing to 1st has finished in a batch pipeline using Wait.on() method(new feature in Apache Beam 2.4)? The example given in the Apache Beam documentation is:
PCollection<Void> firstWriteResults = data.apply(ParDo.of(...write to first database...));
data.apply(Wait.on(firstWriteResults))
// Windows of this intermediate PCollection will be processed no earlier than when
// the respective window of firstWriteResults closes.
.apply(ParDo.of(...write to second database...));
But why would I write to database from within ParDo? Can we not do the same by using the I/O transforms given in Dataflow?
Thanks.
Yes this is possible, although there are some known limitations and there is currently some work being done to further support this.
In order to make this work you can do something like the following:
WriteResult writeResult = data.apply(BigQueryIO.write()
...
.withMethod(BigQueryIO.Write.Method.STREAMING_INSERTS)
);
data.apply(Wait.on(writeResults.getFailedInserts()))
.apply(...some transform which writes to second database...);
It should be noted that this only works with streaming inserts and wont work with file loads. At the same time there is some work being done currently to better support this use case that you can follow here
Helpful references:
http://moi.vonos.net/cloud/beam-send-pubsub/
http://osdir.com/apache-beam-users/msg02120.html

Easy way to get and modify a .vb file like a text file

I'm creating a vb.net application for converting a Data Access Layer classes from Oracle DB oriented to SQL Server DB oriented ,
And i must convert every Oracle oriented class like :
OracleCommand , OracleDataAdapter , OracleDataReader ...
to it's SQL Server equivalent .
Is there any way to get methods list from .vb file and to get the body of each method from vb application to change it dynamically and create new version of each file ???
If you compile in debug mode, one of the output files is PROGRAMNAME.xml, which contains all the method signatures for the EXE. (Where PROGRAMNAME is the name of your EXE).
You can write a program to edit the .vb files just like any other text file, because they are text files.
Download trial version of Resharper and it will help you to do it with no pain. One month trial should be enough.
VS also has some tools - "Find all references" and "Find and Replace in Files". So, you can Find and replace Oracle... To Sql...
On the other note, why not writing wrappers that will provide DB-agnostic providers for command, transaction, etc. For example, instead of
Dim reader As OracleDataReader = OracleCommand.ExecuteReader...
Do this
Dim reader As IDataReader = MyAgnosticCommand.ExecuteReader...
So, if the new boss is coming tomorrow and says, "Oh we need oracle" - you don't have to do it all over again.

Inferring topics with mallet, using the saved topic state

I've used the following command to generate a topic model from some documents:
bin/mallet train-topics --input topic-input.mallet --num-topics 100 --output-state topic-state.gz
I have not, however, used the --output-model option to generate a serialized topic trainer object. Is there any way I can use the state file to infer topics for new documents? Training is slow, and it'll take a couple of days for me to retrain, if I have to create the serialized model from scratch.
We did not use the command line tools shipped with mallet, we just use the mallet api to create the serialized model for inferences of the new document. Two point need special notice:
You need serialize out the pipes you used just after you finish the training (For my case, it is SerialPipes)
And of cause the model need also to be serialized after you finish the training(For my case, it is ParallelTopicModel)
Please check with the java doc:
http://mallet.cs.umass.edu/api/cc/mallet/pipe/SerialPipes.html
http://mallet.cs.umass.edu/api/cc/mallet/topics/ParallelTopicModel.html
Restoring a model from the state file appears to be a new feature in mallet 2.0.7 according to the release notes.
Ability to restore models from gzipped "state" files. From the new
TopicTrainer, use the --input-state [filename] argument. Note that you
can manually edit this file. Any token with topic set to -1 will be
immediately resampled upon loading.
If you mean you want to see how new documents fit into a previously trained topic model, then I'm afraid there is no simple command you can use to do it right.
The class cc.mallet.topics.LDA in mallet 2.0.7's source code provides such a utility, try to understand it and use it in your program.
P.S., If my memory serves, there is some problem with the implementation of the function in that class:
public void addDocuments(InstanceList additionalDocuments,
int numIterations, int showTopicsInterval,
int outputModelInterval, String outputModelFilename,
Randoms r)
You have to rewrite it.