I just got a task to check hundreds of jobs in SDSF.ST and save these job logs into specified datasets. I think I should automate this task using REXX but actually not familiar with the language. Having searched on Google, I still can't find a workable solution.
Anyone has any experience on this?
Take a look at the Rexx interface to SDSF. As this is a language with which you are unfamiliar, you will likely find use for the programming reference and user's guide. The Rexx Language Association has some links that may be helpful in getting up to speed, but Rexx was designed to be easy to learn and use so you shouldn't have too much trouble.
You could also use the XDC line command if that's quicker than writing a program.
Another possibility is the observation that SDSF is really scanning/summarizing JES2 datasets. So, is the requirement to store sysout datasets created by batch jobs? if so, then it is possible to code the batch job so that it's sysout is stored.
if the requirement is to store the joblog itself, then get a JES2 manual and read-up on how JES manages joblogs (it has been awhile, so I have forgotten this info.). After learning how JES2 manages its job logs, then there might be an obvious solution to saving them.
The above might have some advantages. For example, what if a new job is added to the system that is needed to be saved? Do you update your REXX code or a dataset of jobnames, or is the JCL for the job itself coded in a way to save the required sysout/joblog?
This makes a difference. If someone has to update a list everytime a new production job is added, then this is time consuming and error prone. If the JCL has to be coded in a specific way then it only has to be documented and it is easy for managers to say "code production jobs in a such-and-so fashion.
Related
Just a bit of background on where my question is coming from: my company has multiple databases across the globe that uses the same schema and once of my department's responsibility is to monitor and make sure all these DBs are in sync from a schema SQL change perspective.
Now, my question is if anyone knows of any Software/tool that has a a Frontend UI which is able to do the following (the lower number the more important to have):
Able to track what SQL code change was applied on which database and when. Basically, if we write a SQL query that changed the structure of a table and we need it applied to 80% or 100% percent of the DBs, either via manual input or some automatic check the tool will tell me that yes, this was indeed applied.
Code distribution tool: we give it the query or a file that contains the code and it's able to push to the Databases it needs to (and create the audit log for that)
Code/object repository: keeps track of what was custom developed and pushed to the databases
I know SSIS might be able to do some of these things, but we need a tool that also has a simple frontend interface that can be accessed by non-IT personnel. (*clarification: we are not planning on giving non-DBA people access to change things, just to the audit aspect of said tool)
I've tried searching the internet, but i have a feeling i'm not using the right vocabulary to get the results i'm looking for.
Hence i wanted to see if the community was aware of any such tool or something similar?
Try searching for one of these two types of systems:
Release/Build/Deployment Automation Complex programs like Serena that have modules for pushing, tracking, and auditing any kind of software, anywhere. These will include all the GUI bells and whistles. But you'll have to deal with extra databases, configuration, agents, workflows, consultants(?), etc. These programs are geared more towards developers.
Remote Execution/Configuration Management Simpler programs like Salt, Fabric, and Ansible that let you run operating system commands anywhere. They don't offer as many features, and you have to do more of the work yourself, but in some ways that's liberating. If you know exactly what commands you want to run you don't need some other program holding your hand. These programs are geared more towards administrators.
From a database administrator's point of view, the main problem with those types of programs is that none of them are relational. Yes they can connect to a database and run a script, but none of them really speak SQL. Their native languages are Java, XML, SSH, etc. There's nothing wrong with those technologies, but if you only care about databases you don't want to deal with all that complexity.
If you're not happy with either of those types of programs I recommend you look at my open source program Method5. It is a remote execution program built as an extension to Oracle SQL. It works entirely inside an Oracle database, so you can install it yourself and won't need any additional websites, agents, configuration files, GUIs, etc.
Based on your comment about getting bogged down by links, and my answer to your question about half a year ago, I think this is the kind of program you were gradually heading towards creating. It took my team a couple thousand hours of developing and testing to get it right so you were probably wise to give up on making your own.
To specifically answer your requirements:
Tracking Changes are stored in an audit trail. But more importantly it has the ability and a pre-built script to compare an unlimited number of schemas, all in one view. At the end of the day what you really want to know is "are my schemas the same", not necessarily "did the same thing get run everywhere?".
Code Distribution If you just have SQL or PL/SQL, deploying it through Method5 is as easy as it can possibly get. Just specify what you want to run, and where you want to run it, like this: select * from table(m5('create index ...', 'dev, qa, prodDB1, prodDB2')); The program does not (yet) run SQL*Plus scripts. But when you have the ability to run SQL and PL/SQL so easily there's little need for SQL*Plus.
Code Repository All executions are stored in a simple table, M5_AUDIT. It contains the code, who ran it, where they ran it, and how they ran it. It wasn't designed to be a repository like SVN but it's good enough for simple auditing and tracking code.
Method5 does not contain a GUI but in some ways I consider that to be a feature. Since everything is done relationally, everything is in a simple table. You can use any of your existing GUIs - Toad, PL/SQL Developer, Excel, Apex, etc. It's a robust back-end solution that will hopefully make a good foundation for easily building a simple front end.
I'm wondering what some of the best practices/tools are people have found for building and managing ETL jobs on bigquery.
At the moment I have lots of sql 'templates' (horribly parameterized by lob, date etc using sed type string replacements into a tmp.sql file and then running that) and I use the command line tool to run sequences of them and send output to tables. It works fine but is getting a bit unwieldy. I still don't get why I can't run stored procedure type parameterized scripts on bigquery. Or even some sort of gui to build and manage pipelines.
I love bigquery but really feel like I'm either missing something very obvious here or its a real gap in the product (e.g. Pretty sure Apache Drill more built out in this regard).
So just wondering if anyone can share any best practice etl tips or approaches you use yourself.
I do also use xplenty for some jobs which is good but it's also a bit messy in that I can't just write sql in it so can be painful to build and debug complicated pipelines.
Was thinking about looking into Talend also, but really parameterized stored procedures, macros and SQL is all i'd ideally need.
Sorry if this is more of a discussion question then specific code. Happy to move it to reddit or something if more suited there.
Google Cloud Dataflow is closer to your needs than BigQuery in my opinion. We use it for real-time streaming ETL with automatic scaling. Works great, though you will need to code Java.
I work primarily with so-called "Big Data"; the ETL and analytics parts. One of the challenges I constantly face is finding a good way to "test my data" so to speak. For my mapreduce and ETL scripts I write solid unit test coverage but if there are unexpected underlying changes in the data itself (coming from multiple application systems) the code won't necessarily throw a noticeable error which leaves me with bad / altered data that I don't know about.
Are there any best practices out there that help people keep an eye on what / how the underlying data may be changing?
Our technology stack is AWS EMR, Hive, Postgres, and Python. We're not really interested in bringing in a big ETL framework like Informatica.
You could create some kind of mapping files(maybe xml or something) as per the standards specific to your systems and validate your incoming data before putting it into your cluster, or maybe during the process itself. I was facing a similar issue sometime ago and ended up doing this.
I don't know how feasible it is for your data and your use case but it did the trick for us. I had to create the xml files once(I know it's boring and tedious, but worth giving a try) and now whenever I get new files I use these xml files to validate the data before putting it into my cluster to check whether the data is correct or not(as per the standards defined). This saves a lot of time and effort which would be involved if I have to check everything manually everytime I get some new data.
What would be the best approach for versioning my whole database ?
Creating a file for each database object (table,view,procedsure..) or rather having one file for all DDL scripts and any new change will be put in a separate file ?
What about handling changes made in a Database manager tool ?
I'd like to have a generic solutions for any kind of RDBMS.
Are there any other options ?
I'm a huge VCS fan in general and a big Mercurial booster, but I really think you're going down the wrong path.
VCSs aren't just about iterative changes, the "what", they're also about answering the "who", "when", and "why". For a database those answers are a lot less interesting or hard to provide to the VCS. If you're doing nightly exports and commits the "who" will always be "cron" and the "why" will always be "midnight".
The other thing modern VCSs do really well is helping you merge changes from multiple branches. That's less applicable in the database world. Very seldom do you say "I want this table structure, but this data", and if you do the text/diff merge isn't going to help you much.
The thing that does do "what" and "when" very well is an incremental backup system, and that's probably the better fit.
At work we use Tivoli and at home I use rdiff-backup and duplicity, but there are plenty of great options.
I guess my general rule of thumb is "if it was typed by hand by a human then it does into source control, and if it was generated/exported then it goes in the incremental backups"
Certainly you can make this work, but I don't think it will buy you much over the more traditional backup solutions.
Have a look at this post
If you need generic solution - put everything in the scripts (simple text files) and put under Version Control system (can be used any of VCS).
Grouping similar database objects into scripts will be depend on your requirement.
So you may for example:
Store table/indexes/ in one or several script
Each procedure store in individual script or combine small procedures into one script.
However need to remember one important thing with this approach: don't forget change scripts if you changed table/view/procedure directly in databases and don't create/recreate/compile you db objects in database after changing scripts.
SQL Source Control currently supports SVN and TFS, but Mercurial requests are increasing rapidly and we're hoping to have a story for this very soon.
We use UserVoice to measure demand so please vote accordingly if you're interesting in this: http://redgate.uservoice.com/forums/39019-sql-source-control
I had a (friendly but heated) argument with my lead developer the other day because our project has TSQL Scripts that I code directly into SQL files which I then run against the database. I find that when I do this, it's easy to work out the schema in advance without fiddly pointing and clicking and then there's no opportunity to forget to generate a script to put into source control as generating the script no longer becomes a chore you have to do after the fact, but is an implicit part of the process (and also leads to cleaner scripts without the extra crap that SQL Server Management Studio inserts into the scripts it generates).
My lead developer insists that having to manually script it out is a pain in the arse and that he absolutely refuses to write his scripts by hand when there are perfectly good tools to do it without coding. I've noticed that the copying of his changes into the actual scripts tends to get delayed a bit as a result though.
What are your thoughts on the pros and/or cons of doing it one way vs the other? Am I being too rigid/old-school in my sticking to hand coding schema scripts or is he being too reliant on third party tools and losing something in the process?
I always script stuff myself because the wizards sometimes don't script things in a way that I like it and will also give funky names to defaults
scripting things yourself is also good in case you get laid off and you have to go for an interview where they ask you to script DDL on the whiteboard
As I usually collaborate with a colleague during the schema design, I tend to design the schema using the GUI tools, as its easier to discuss it with a diagram of the tables in front of you. I then generate the scripts, being careful to select the exact options that I want to avoid having to make manual changes post-export.
I think a decision on the relative merits of the two approaches might take into account factors such as
the frequency of changes to the schema
the frequency with which changes need to be propagated to other schemas (test, user acceptance, production, clients * n, etc)
the degree to which the schema may vary across development branches
how well-known in advance your various changes can be scheduled
whether or not you can generate SQL "diff" scripts between schemas.
On balance, I tend to prefer to work with a script for each change (or "migration"). It lets me resequence change releases as priorities shift.
Just because you can create tables in a graphical tool doesn't necessarily mean you should.
I find its as quick to write a script as it is to use SQLMS. You still have to type names in SQLMS, and the time spent moving from keyboard and mouse could be used writing the proper script anyway.
The two of you are almost working with two sets of code. Consistency seems to be a key factor on these types of decisions. In your case, if you create a script, your boss uses the gui to add a field, how do you stay in sync? You can't use your script to rebuild the table without editing it (Chance for error.).
Maybe he should pull rank and force you to format your scripts the same way the GUI creates them - just kidding.
I think you should flip on it..........