Custom authentication in Impala (without kerberos/LDAP) - authentication

We have a big data cluster that we have created by directly installing the tarballs from Cloudera website. We are currently using (Hive, Impala, Hadoop, Spark, Kafka). In the current setup we don't have any authentication/authorization setup.
We are in the process of adding authentication/authorization however we decided to not use Kerberos to avoid the hassle of setting up a KDC server.
We were able to setup Sentry for authorization and for authentication we are using Hive Custom authentication where in we validate user credentials through an internal REST API as described here
We are trying to setup similar authentication mechanism for Impala however we have not been able to figure out a way to do Custom authentication in Impala.
Please let us know if apart from LDAP/Kerberos there is an alternative way to authenticate a user, something that is equivalent of Hive Custom authentication.

Related

Add Github Identity Provider to AWS Cognito

I have created a Github OAuth app and I am trying to add the app as an OIDC application to AWS Cognito.
However, I cannot find a proper overview about the endpoints and data to fill in anywhere in the Github Docs.
The following fields are required:
Issuer -> ?
Authorization endpoint => https://github.com/login/oauth/authorize (?)
Token endpoint => https://github.com/login/oauth/access_token (?)
Userinfo endpoint => https://api.github.com/user (?)
Jwks uri => ?
I couldn't find the Jwks uri anywhere. Any help would be highly appreciated.
Seems like there is no way to get this working out of the box.
https://github.com/TimothyJones/github-cognito-openid-wrapper seems to be a way to get this working.
If any Cognito dev sees this, please add Github/Gitlab/Bitbucket support.
GitLab 14.7 (January 2022) might help:
OpenID Connect support for GitLab CI/CD
Connecting GitLab CI/CD to cloud providers using environment variables works fine for many use cases.
However, it doesn’t scale well if you need advanced permissions management or would prefer a signed, short-lived, contextualized connection to your cloud provider.
GitLab 12.10 shipped initial support for JWT token-based connection (CI_JOB_JWT) to enable HashiCorp Vault users to safely retrieve secrets. That implementation was restricted to Vault, while the logic we built JWT upon opened up the possibility to connect to other providers as well.
In GitLab 14.7, we are introducing a CI_JOB_JWT_V2 environment variable that can be used to connect to AWS, GCP, Vault, and likely many other cloud services.
Please note that this is an alpha feature and not ready for production use. Your feedback is welcomed in this epic.
For AWS specifically, with the new CI_JOB_JWT_V2 variable, you can connect to AWS to retrieve secrets, or to deploy within your account. You can also manage access rights to your cluster using AWS IAM roles.
You can read more on setting up OIDC connection with AWS.
The new variable is automatically injected into your pipeline but is not backward compatible with the current CI_JOB_JWT.
Until GitLab 15.0, the CI_JOB_JWT will continue to work normally but this will change in a future release. We will notify you about the change in time.
The secrets stanza today uses the CI_JOB_JWT_V1 variable. If you use the secrets stanza, you don’t have to make any changes yet.
See Documentation and Issue.

Use keycloak as auth service or IDP?

So, im doing research to know if its a good alternative to implement keycloak on the environment i'm working at.
Im using LDAP to manage users at my workingplace. I was wondering if is there a way to use keycloak as auth service in all upcoming systems and some of the existing ones. We are currently managing it with an IDP that we need to improve or replace, also there are some systems use their own login (this will eventually change).
The main problem i've crossed is that keycloak synchronizes against ldap and i dont want user data to be stored on keycloak, maybe if its only login data. User data is planned to be kept only on ldap's database in case that any userdata needs to be updated.
So is there a way to use keycloak only as an auth service fetching user credentials from ldap on every auth request?
pd: maybe i am mistaken on the meaning of what's an auth service an whats an IDP.
Actually it is not necessary that LDAP users are synced to Keycloak.
Keycloak supports both options
Importing and optionally syncing users from LDAP to Keycloak
or
Always getting the User info from LDAP directly.
But keycloak will always generate some basic federated user in it's database (e.g. for keeping up a session when using OpenID Connect - but you should not really care about that).
As far as I know (but I've not used that myself) you could also use keycloak to maintain the LDAP users data and write changes back to LDAP (see "Edit Mode" in Keycloak documentation)
Check Keycloak documentation regarding LDAP stuff to get more information https://www.keycloak.org/docs/6.0/server_admin/#_ldap
Beside the User-Data Topic, Keycloak provides a lot of different Protocols (like SAML and OpenIDConnect) to provide authentication for your services. So you could use different/multiple authentication protocols depending on your applications with just one "LDAP-Backend"

securing Elasticsearch cluster on Elastic Cloud

What is the best way to secure a connection between an Elasticsearch cluster hosted on Elastic Cloud and a backend given that we have hundreds of thousands of users and that I want to handle the authorization logic on the backend itself not on Elasticsearch?
Is it better to create a "system" user in the native realm with all the read and write accesses (it looks like the user feature is intended for real end-users) or to use other types of authentication (but SAML, PKI or Kerberos are also end-user oriented)? Or using other security means like IP based?
I'm used to Elasticsearch service on AWS where authorization is based on IAM roles so I'm a bit lost here.
edit: 18 months later, there's no definitive answer on this, if I had to do it again, I would probably end up using JWT.

KeyCloak with Custom DB. Is it possible to back keycloak with cutom-db without keep syncing it with KeyCloak DB?

Is it possible to use keycloak with custom DB? E.g. we have a database where we have all the users and their password. Can we use keycloak with that database or do we need to add each user to keycloak and have to keep our-user-db and keycloak-db in sync ?
The answer is YES. In User Federation, you can add LDAP or kerberos with Keycloak supported. And you can develop custom User Federation.
Custom Provider tells you can achieve your goal. But from here, it says you can migrate from early User Federation SPI. This User Federation SPI can be created in Keycloak 2.4.0 with the APIs at that version. On keycloak website, I just find 1.9.0, it says there's a provider/federation-provider sample you can refer. Maybe you can try to find 2.4.0 bundle and sample to do your work.
After you create your User Federation SPI, then you refer here to migrate. You can choose to import or non-import.

User Authentication in hadoop Hdfs

I have integrated milton webdav with hadoop hdfs and able to read/write files to the hdfs cluster.
I have also added the authorization part using linux file permissions so only authorized users can access the hdfs server, however, I am stuck at the authentication part.
It seems hadoop does not provide any in built authentication and the users are identified only through unix 'whoami', meaning I cannot enable password for the specific user.
ref: http://hadoop.apache.org/common/docs/r1.0.3/hdfs_permissions_guide.html
So even if I create a new user and set permissions for it, there is no way to identify whether the user is authenticate or not. Two users with the same username and different password have the access to the all the resources intended for that username.
I am wondering if there is any way to enable user authentication in hdfs (either intrinsic in any new hadoop release or using third party tool like kerbores etc.)
Edit:
Ok, I have checked and it seems that kerberos may be an option but I just want to know if there is any other alternative available for authentication.
Thanks,
-chhavi
Right now kerberos is the only supported "real" authentication protocol. The "simple" protocol is completely trusting the client's whois information.
To setup kerberos authentication I suggest this guide: https://ccp.cloudera.com/display/CDH4DOC/Configuring+Hadoop+Security+in+CDH4
msktutil is a nice tool for creating kerberos keytabs in linux: https://fuhm.net/software/msktutil/
When creating service principals, make sure you have correct DNS settings, i.e. if you have a server named "host1.yourdomain.com", and that resolves to IP 1.2.3.4, then that IP should in turn resolve back to host1.yourdomain.com.
Also note that kerberos Negotiate Authentication headers might be larger than Jetty's built-in header size limit, in that case you need to modify org.apache.hadoop.http.HttpServer and add ret.setHeaderBufferSize(16*1024); in createDefaultChannelConnector(). I had to.