How to specify the TGT kerberos ticket cache in beeline - apache

I have this scenario where I want to make hive jdbc connections using multiple users/principals. I can get multiple KERBEROS tickets and store them in different cache files. For example one could be in /tmp/ticket1 and the other in /tmp/ticket2. However, when I execute the beeline how do I specify which ticket to use. I want to run queries as different users.

AFAIK you can't. The whole Hadoop ecosystem assumes that you use a ticket cache in the default location - even legitimate KRB5 environment variables are ignored (or just mess with some hardcoded defaults somewhere).
On my current assignment I had to develop a crude "Beeline emulator" in plain Java to handle that issue. It took me weeks to troubleshoot the KRB and the GSS configuration issues, including weird syntax inconsistencies between various versions of OpenJDK and Sun JRE (plus Linux vs. Windows), but finally I got it working.
And no, I will never share it with anyone outside of my Big Corp client with Big Lawyer staff...

Related

Bitvise SH Client Installation error. CreateDirectory() failed: Windows error 5: Access is denied

I'm trying to install bit vise ssh client but its not installing and throwing an exception as this.
Exception caught:
Failed to create directory "C:\Program Files(x86)\Common Files\Bitvise"
CreateDirectory() failed: windows error 5: Access is denied.
My system is 64 bit, I know bitvise has one version which supports both 64 and 32bit.
I also tried "run as Administrator", still same exception. Could anyone tell me the procedure to install it properly !
Logging: Always create an MSI log for debugging when encountering any deployment problems. See that link for hints on interpreting the log file content. Search for "value 3" first of all:
msiexec.exe /i C:\Path\Your.msi /L*vx! C:\Your.log
In general: check vendor web sites and / or user forums to figure out details on known issues. It could be a permission issue on your TEMP folder.
Emergency Approach: Use a clean virtual machine to get the software running. Try different OS-versions. Just for a heartbeat in a pinch. Or try someone else's computer. Obvious yes, but try it if you can.
Keep in mind that "very clean" virtuals (there is absolutely nothing on there - just a fresh OS) could lack certain runtimes that might be "taken for granted" and hence missing from an installer. VCRuntime, .NET versions and such. Just in case you see mysterious errors there too.
First Checks: A simplified, generic check-list for deployment issues:
AD / Group Policies: Corporate environments could have group policies and restrictions preventing the installation of anything at all. Check that first.
Installation Media: Re-download installation media to ensure its integrity.
Corrupted by Malware: Note that malware or other factors can corrupt downloaded files, but more commonly they are destroyed in-transit.
Wrong Bitness: The setup could be the wrong bitness (x64 on 32 bit system) or architecture such as Itanium (incompatible with normal x64 systems). Or even the wrong OS (zip file wrappers etc...).
Corrupted / Quarantined by Scanners: Security suites, firewalls, corporate blocks and the likes can cause problems (separate issue below - not sure if anti-virus programs try to clean binaries anymore? Block they certainly do).
Incomplete Download: Launching before download is fully finished (premature launch) is a classic weirdness - error messages are generally ok, but can be misleading. Remember to allow anti-virus scanners to complete their post-download scan. This can take much longer than you think (they hash the file, check their site, etc...).
Download Mirror Issue: Sometimes the download comes from a number of download servers, some of which could be corrupted or contain faulty media or be misconfigured. Download again - check with virustotal.com and repeat a few times to verify. Have your colleague in another office download? Different mirror likely (automatic load-balancing - when you can't pick another server yourself).
Network Problems (LAN): When you have problems, try to copy installation files to a local location (the desktop will do) to eliminate any LAN network issues as the source of your deployment problem. If there are network problems file copy might fail with a proper warning message? Network related fallacies. More towards bottom.
Missing Runtimes: A few, very core-runtimes can make setups fall over. This is particularly common on virtual machines that are "fresh" and basic.
Examples would be: VCRedist (in particular), .NET, Powershell, etc...
Lacking and more advanced components such as IIS, MSSQL, .NET Core, Java, etc... can also make some badly authored setups fall over.
Admin Rights: Ensure you have real admin rights on the box in question. In other words you are logged on using a real administrator account. Avoid "run-as" if you have a failure to look at. Try a real login.
Reboot: Just to try the obvious. Reboot and allow the PC to "settle down after reboot". This means you ensure that Windows Update hasn't started installing - or something else that was set to start pending the next reboot. PCs that are seldomly rebooted can have a lot going on after a reboot - some try to "reboot twice" - or even several times - to make sure all locks and blocks from "stuff that is happening" are released. Make sure to allow update operations to finish before rebooting once, twice or more (wait for reboot prompt). Virtual machines that are reverted to a previous state can be a nightmare when it comes to things that automatically start to update and cause confusion and problems.
Clean Slate: If you don't reboot, close down all applications before running your setup. This sorts out various locks and blocking happenstances. Preferably reboot first and run the setup the first thing you do when the machine is back up again. Again: give the machine enough time to be idle - everything started (services and such - and no updates installing).
Disk Space & Integrity: Ensure available free disk space AND that there are no errors on disk. The very small SSD and NVME disks of the last few years have made this problem more acute again.
Different user: Try installing as a different and real admin user. The important thing here is that this is a different admin account than you first tried (user account profile issues). So, in other words log in as a real admin user and don't just use "run as" (create a new account if you need to). An example of a problem could be someone who has messed up their user profile shell folder settings so that the directory table resolution of MSI fails. Another user profile would normally be unaffected and still work OK.
ACL - Access Control: Very often access denied can be related to custom NTFS ACL configuration that is erroneous. This can lead to weird error messages during installation. In corporate environments - with application packagers adapting installers - ACLs are sometimes modified extensively to tighten security. I have seen this a lot, but there are also other sources of ACL changes such as system administrator scripts, malware and I saw issues after a security fix from Windows Update a few years ago. Tightened security can trigger a lot of errors previously unseen in software that should "know better".
Malware check: Run anti-virus or Windows Defender to verify that you don't have a malware issue on your box. Additionally check the installation media with https://www.virustotal.com/ to ensure it is not malware itself! (the setup.exe could be infected, or the whole product could be malware outright - never know).
Security Software Interference: Anti-virus, firewalls, scanners and other security products can be overactive and block access to a folder or a resource so it looks like it is an ACL permission issue. Disable temporarily if possible when required. Do anti-virus software still try to fix binaries in the age of digital certificates? I am not sure. Always check installation file using virustotal.com.
Localized Setups: Sometimes setups made for other languages than English - or rather another language than the original setup (could be any language) - fail on systems with other languages installed and in use. Try on a clean virtual with the "setup-expected language". Problems like these indicate VERY bad setup design (hard coded localized paths, incorrect server paths or addresses due to translation errors, etc...) - but due to QA resources they are not uncommon. In essence the main-language version is generally (in almost all cases) put through better testing.
Mount Points: Some disks have mounted drives in folders and such things - this can cause some seriously weird problems. Try on a clean virtual with no drama-settings.
NTFS / FAT32: (Somewhat edge-case). It is no longer possible to install Windows 10 on a FAT32 drive - with the limitations that strike (no ACL permissions, max 4gb files, no journaling and such). However, the setup could be redirected to a non-system FAT32 partition or some other disk format. This could trigger security problems (no ACL permissions), but should not generally create any access denied issues - barring any custom actions trying to apply ACL permissions and failing (this might degrade gracefully by now, I don't know). However there are file size limitations in FAT32 disks (4gb) that might actually trigger errors these days for huge setups (games, video files, etc...). Note that downstream Windows OSs might still allow FAT32 system partitions. And finally - and importantly - FAT32 is not a journaling file system. This means data corruption can easily happen without self-correction.
Flagged Downloaded File: In newer versions of Windows downloaded files are flagged as "This file came from another computer and might be blocked to help protect this computer". See screenshot below. Read more details about the feature here and Digital signatures, false positives, tagged downloaded file. Just make sure your file does NOT have this flag (I do not have a complete overview of all problems that can result from this):
Odds and Ends: There are additional things such as setups being to old to install properly (they don't handle modern Windows features well - you can try to run the setup in compatibility mode by enabling this in the property page for the setup.exe file itself) and older Installshield setups had lots of DCOM-based installscript engine issues and such things. Other setup vendors have their own problems - and quite a few of them for older setups. Brand new stuff, and ancient stuff - always surprises.
Network Problems (LAN): This is mentioned above in the "Installation Media" section. You can copy files locally to try to eliminate LAN network problems as a source of problem (SAMBA problems, network overload and packet loss, interfering scanners, timeouts, etc...). You might get a real error message if you try to copy local. Try to download file directly from the Internet vendor site to the desktop as a test. Network related fallacies.
Update: Incompatibilities: It happens that software can't co-exist properly. These situations can be rather obvious (COM version incompatibilities, setups designed to detect existing software and prevent themselves from installing, setups in different language versions quarreling, etc...) or quite hard to work out (deep-seated driver problems, hardware peculiarities, anti-virus false positives or otherwise unsolvable problems). Make sure you test your setup on a clear virtual whenever you have problems. You can also use that as a "solution" if it works - have people run incompatible software on virtuals - obviously one of the key use cases for virtuals (there are many others).
Full Check List: See Section "Generic Tricks? in this answer for more.
Smartscreen issues: Digital signatures, false positives, tagged downloaded file

Perl6: rakudobrew cannot build moar

I'd like to upgrade to the newest version of Perl6,
rakudobrew build moar
Update git reference: rakudo
Cloning into 'rakudo'...
fatal: unable to connect to github.com:
github.com[0: 140.82.114.4]: errno=Connection timed out
Failed running git clone git://github.com/rakudo/rakudo.git rakudo at /home/con/.rakudobrew/bin/rakudobrew line 57.
main::run("git clone git://github.com/rakudo/rakudo.git rakudo") called at /home/con/.rakudobrew/bin/rakudobrew line 397
main::update_git_reference("rakudo") called at /home/con/.rakudobrew/bin/rakudobrew line 368
main::build_impl("moar", undef, "") called at /home/con/.rakudobrew/bin/rakudobrew line 115
this is just a simple connection failure, but how do I fix this?
Your connection problem is not really anything to do with any P6 related software, or in fact any software you're using. It is, as you say, "just a simple connection failure". And most such failures are transient and "fix themselves". As JJ notes, in such scenarios you just wait and then things start working again.
So by the time you read this it'll probably be working for you again without you having fixed anything. But I'm writing an answer anyway with these sections:
Consider not using rakudobrew
Connection problems that "fix themselves"
Connection problems you investigate or fix yourself
Getting around single points of failure
Consider not using rakudobrew
The main purpose of rakudobrew is to support installation of many versions of Rakudo simultaneously and the main audience for the tool is folk hacking on the Rakudo compiler, not those merely using it.
If you're just a regular user, not someone developing the Rakudo compiler and/or don't need to have multiple versions of Rakudo, with complete source code, installed simultaneously, then consider just downloading and installing Rakudo files directly, eg. via rakudo.org/files, rather than via rakudobrew.
Connection problems that "fix themselves"
rakudobrew failed because a git clone ... command failed because the connection with the github.com server timed out.
A server timing out when doing something that usually works using a connection that usually works is likely a transient problem, aka a "please try later" problem.
Transient problems typically "fix themselves" a few seconds, minutes or hours later.
If there's still a problem when you try again, and you want to spend time trying to find out what's going on officially, then look for a status page for that server.
Here are two status pages I know of for github.com:
https://www.githubstatus.com/
https://twitter.com/githubstatus?lang=en-gb.
And for unofficial scuttlebutt I suggest reading the twitter feed.
For me, right now, github.com is working fine and the status page says all systems are go.
So it should now be working for you too.
If it's not, then you can wait longer, or investigate. It you want to investigate, start by looking at the status pages above.
Connection problems you investigate or fix yourself
If github claims it's working fine then there's presumably a problem with your local internet "on-ramp" (your system or your internet service provider's) or somewhere further afield between your on-ramp and the server you're failing to connect to. (You can only know approximately where the server is based on which region of the world administers the IP address the server is associated with at any given moment.)
The next place to look will be places like the internet traffic report; this indicates traffic jams and the like across the planet. (Ignore the visual display, which is broken on some browsers, and click on the links in the table to drill down.)
If it's all green between you and the region that administers the IP address of the server you're failing to connect to, then the next place to turn would be your system's administrator and/or ISP.
Failing that, then perhaps you can ask a question at a sister stackexchange site like serverfault.com or superuser.com.
Getting around single points of failure
Perhaps you were thinking there might be some system redundancy and/or you're interested in that aspect.
P5's ecosystem and its tools are generally mature and limit spofs. This is unlike the ecosystems and tools of most of the other languages out there; so if you've gotten used to the remarkable reliability/availability of CPAN due to its avoidance of spofs, and by extension perlbrew, well, you've been spoiled by P5.
The P6 ecosystem/tool combinations are evolving in the P5 tradition.
For example, the zef package manager automatically connects to CPAN alongside github, and is built to be able to connect to other repos. The ecosystem is partway there to take advantage of this zef capability in that many modules are redundantly hosted on both CPAN and github.
rakudobrew ignores CPAN and assumes use of git repos. It is impressively configurable via its Variables.pm file which includes a %git_repos variable, which could be re-targeted to an alternative git repo site like gitlab. But no one has, to my knowledge, arranged to redundantly copy and update the relevant rakudo files to another git repo site, so this spof-avoidance ability apparently inherent in rakudobrew's code is, afaik, moot for now.

What would cause SSIS to ignore Package Configuration Connections?

I have a very simple SSIS Package that has 2 connections defined in the Connection Manager section. An MS Access Data Source and an MS SQL Data Source Destination. All this package does is Truncate a table in the SQL Destination and Imports data from MS Access into the SQL table. This works as expected during Development within VS2013.
Now, I also have enabled Package Configurations for the package and have a couple of XML Configuration files (1 for each Connection) in a folder on the root of the C: drive. The Configuration file connections differ based on the server where they reside, but the folder structure exists on both servers so the package can execute against the server from which it is run.
I've checked the box to enable Package Configurations and deploy the package to 2 different Servers. 1 for Development and the other for QA. When I execute the package via the SSMS Integration package execution on my Development Server, the package utilizes the Development table. But when I execute the same package on my QA environment, it also utilizes the Development table.
Since the Development connection is the one that is embedded in the package via the Connection Manager, it appears (presumably anyway) that the package is using the embedded connection and ignoring the configuration files.
I have alternatively explicitly added the path to the Configuration file within the Execute Package Utility in the Configurations section to see if it made any difference but the results are the same. The configuration file is not acknowledged. So it again appears that the package is using the embedded connections that defined in the Configuration Managers.
I suppose I "may" be able to remove the Connections from the package in the Connection Managers section and turn off validations during Design time and then deploy again in effort of forcing the package to use the Config files but that doesn't seem like the way to go and a hack at best; provided that it would even work.
Not that I think it should make a difference but to provide more detail, here is a bit more concerning my Server Configuration:
Development - SQL 2014 [ServerName]
Quality Assurance - SQL 2014 [ServerName][InstanceName]
I don't recall ever having this issue before, hence my reason for posting.
Ok, since I am working against a dead line; I was hoping to acquire an answer sooner than later. But since that wasn't the case and because I've seen variations of this question before without a definitive answer (at least to satisfy this scenario) I performed some tests and am posting this for others who may also have need of this information.
The Following Conditions will ignore the use of Configuration Files even if Package Configurations are enabled in an SSIS Package. These findings are based on actual tests and affirmed to be true for SQL 2014 although prior versions may also be applicable.
Disclaimer: These tests focused on the Configuration Files as they pertained to actual Server Connections. (E.g. Connection Strings) and not any other variables although it’s conceivable that any other values within the Configuration file would also be affected.
Execution of the Package from within SSMS while connected to the Integrated Services Component and selecting to Run Package. The noted behavior is that whatever Connection value was acquired prior to deployment to the Server is the one that will be used; irrespective of the Configuration Files
Note: This holds true even if configurations are added in the Configurations section prior to execution. Although there is mention that the configurations are not imported and they cannot be edited; the fact is they were neither used during the testing.
If an SQL job is of type SQL Server Integration Services Package and no Configuration File references are actually added to the Configurations tab, the values the job will execute under whatever values were used during the last build within BIDS prior to deployment (Embedded Values)
If multiple configuration files are used by the package but some are omitted in the Configurations tab of the job; the job will use those Configuration Files designated but will default to the last values used in Development (Embedded Values) for those which are not present in the context of the job
Some of these behaviors are not very obvious and I'd imagine it could be a frustrating puzzle when someone expecting to follow the rules of most online tutorials for using Package Configuration files; would have the expected more straight forward results.
I know it was a time consuming task of testing to identify the root cause for me and although I'm not an expert; I'm certainly far from a novice with SSIS.
At any rate, I hope this helps someone else from hours of work and investigations.

Cache Credentials During SVN Merge

A merge from a feature branch to trunk took over 45 minutes to complete.
The merge included a whole lot of jars (~250MB), however, when I did it on the server with the file:// protocol the process took less than 30 seconds.
SVN is being served up by Apache over https.
The version of SVN on the server is
svn, version 1.6.12 (r955767)
compiled Sep 3 2013, 17:49:49
My local version is
svn, version 1.7.7 (r1393599)
compiled Oct 8 2012, 20:42:17
On checking the Apache logs I made over 10k requests and apparently each of these requests went through an authentication layer.
Is there a way to configure the server so that it caches the credentials for a period and doesn't make so many authentication requests?
I guess the tricky part is making sure the credentials are only cached for the life of single svn 'request'. If svn merge makes lots of unique individual https requests, how would you determine how long to store the credential for without adding potential security holes?
First of all I'd strongly suggest you upgrade the server to a 1.7 or 1.8 versions since 1.7 and newer servers support an updated version of the protocol that requires fewer requests for many actions.
Second, if you're using path based authorization you probably want SVNPathAuthz short_circuit in your configuration. Without this for secondary paths (i.e. paths not in the request URI) as may happen for many recursive requests (especially log) when the authorization for those paths are run it runs back through the entire Apache httpd authentication infrastructure. With the setting instead of running the entire authentication/authorization infrastructure for httpd, we simply ask mod_authz_svn to authorize the action against the path. Running through the entire httpd infrastructure can be especially painful if you're using LDAP and it needs to go back to the LDAP server to check credentials. The only reason not to use the short_circuit setting is if you have some other authentication module that depends on the path, I've yet to see an actual setup like this in the wild though.
Finally, if you are using LDAP then I suggest you configure the caching of credentials since this can greatly speed up authentication. Apache httpd provides the mod_ldap module for this and suggest you read the documentation for it.
If you provide more details of the server side setup I might be able to give more tailored suggestions.
The comments suggesting that you not put jars in the repository are valuable, but with some configuration improvements you can help resolve some of your slowness anyway.
The merge included a whole lot of jars (~250MB)
That's your problem! If you go through your network via http://, you have to send those jars via http://, and that can be painfully slow. You can increase the cache size of Apache httpd, or you can setup a parallel svn:// server, but you're still sending 1/4 gigabyte of jars through the network. It's why file:// was so much faster.
You should not be storing jars in your Subversion repository. Here's why:
Version control gives you a lot of power:
It helps you merge differences between branches
It helps you follow the changes taking place.
It helps identify a particular change and why a particular change took place.
Storing binary files like jars provide you none of that. You can't merge binary files, and you can't track their changes.
Not only that, but version control systems usually use diffs to track changes. This saves a lot of space. Imagine a 1 kilobyte text file. In 5 revisions, six lines are changed. Instead of taking up 6K of space, only 1K plus those six changes are stored.
When you store a jar, and then a new version of that jar, you can't easily do a diff, and since jar format is zip, you can't really compress them either, store five versions of a jar in Subversion, and you store pretty close to five times the size of that jar. If a jar file is 10K, you're storing 50K of space for that jar.
So, not only are jar files taking up a lot of space, and they don't give you any power in storage, they can quickly take over your repository. I've seen sites where over 90% of a 8 gigabyte repository is nothing but compiled code and third party jars. And, the useful life of these binary files is really quite limited too. So, in these places, 80% of their Subversion repository is wasted space.
Even worse, you tend to lose where you got that jar, and what is in it. When users put in a jar called commons-beans.jar, I don't know what version that jar is, whether that jar was built by someone, and whether it was somehow munged by that person. I've see users merge two separate jars into a single jar for ease of use. If someone calls that jar commmons-beanutils-1.5.jar because it was version 1.5, it's very likely that someone will update it to version 1.7, but not change the name. (It would affect the build, you have to add and delete, there is always some reason).
So, there's a massive amount of wasted space with little benefit and almost no information. Storing jars is just plain bad news.
But your build needs jars! What should you do?
Get a jar repository like Nexus or Artifactory. Both of these repository managers are free and open source.
Once you store your jars in there, you can fetch the revision of the jar you want either through Maven, Gradel, or if you use Ant and want to keep your Ant build system, Ivy. You can also, if you don't feel like being that fancy, fetch the jars via an Ant <get/> task. If you use Jenkins, Jenkins can easily deploy the built jars for other projects to use in your Maven repository.
So, get rid of the jars. Merging will then be a simple diff between text files. Merging branches will be much quicker, and less information has to be sent over the network. If you don't want to switch to Maven, then use Ivy, or simply update your builds with the <wget> task to fetch the jars and the versions you need.

Requested registry access is not allowed on remote box

We have developed a somewhat diffuse system for handling component installation and upgrades across server environments in an automated manner. It worked happily on our development environment, but I've run into a new problem I've not seen before when attempting to deploy it to a live environment.
The environment in question comprises ten servers, five each on two different geographical sites and domains. Each server runs a WCF based windows service that allows it to talk to each of the other servers and thus keep a track of what's installed where. To facilitate this process we make use of machine level environment variables - and modifying these obviously means registry changes.
Having got all this set up, my first attempts to use the system to install stuff seemed to work, but on one box in particular I'm getting "Requested registry access is not allowed" errors when the code tries to modify the environment variables. I've googled this, obviously, but there seem to be a variety of different causes and I'm really not sure which are the applicable ones. It doesn't help that this is a live environment and that our system has relatively limited internal logging capability.
The only clue I've got is that the guy who did the install on the development boxes wrote a very patch set of documentation on the process. This includes an instruction to modify the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System\LocalAccountTokenFilterPolicy value in the registry and set it to 1. I skipped this during the installation as it looked like a rather dubious security risk. Reading the documentation about this key, it looks relevant but my initial attempts at installing stuff on other boxes without this setting enabled worked fine. Sadly the author went on extended leave over the holidays yesterday and he left no explanation of why this key was needed, so we're a bit in the dark.
Can anyone help us toward the light?
Cheers,
Matt
I've seen this error when code tries to write to the event log using something like EventLog.WriteEntry() and a source that is not a registered event source is specified. When a source is specified that has not previously been registered, it will attempt to register the source, which involves writing to the registry.
I would suggest taking a look at SysInternals Process Monitor:
http://technet.microsoft.com/en-us/sysinternals/bb896645
You can use this to monitor registry access and find out what key you're getting the access denied error on. This may give you some insight as to what is causing the problem.
Essentially he's disabling part of the Remote User Account Control. Without setting the value, Remote UAC strips administrative privileges from account tokens remotely accessing the machine. Yes, it does have security implications. See Description of User Account Control and remote restrictions in Windows Vista for an explanation.