nixos etcd.pem (kubernetes) - ssl

While trying to install Kubernetes on nixos, using the following stanza:
services.kubernetes.masterAddress = "XXXXXX";
users.users.XXXXXX.extraGroups = [ "kubernetes" ];
services.kubernetes = {
roles = ["master" "node"];
};
I hit the following issue:
open /var/lib/kubernetes/secrets/etcd.pem: no such file or directory
I recognize this as a TLS/SSL certificate, but how should I go about generating that file?

The article you used is really old. It was published 2017-07-21 so almost 2,5 years ago. You can be pretty sure it's outdated in one way or another however major NixOS approach to setting up kubernetes cluster from end user perspective may have not changed a lot during this time.
So, after familiarizing with it a bit more... I see that this is actually yet another approach to installing kubernetes cluster and it has nothing to do with "the hard way" I mentioned in my previous comment. On the contrary, it's the easiest kubernetes cluster setup I've ever seen. Actually you don't have to do anything but add a single entry in your configuration.nix and then run nixos-rebuild switch and you can expect everything to be up and running. But there is really a lot, not just a few things that NixOS takes care about "under the hood". Generating proper certificates is just one of many steps involved in kubernetes cluster setup. Keep in mind that Kubernetes installation from scratch is pretty complex task. Take a brief look at this article and you'll see what I mean. This is really amazing thing for educational purposes as there is probably no better way to understand something in-deep, than to build it from scratch, in the possibly most manual way.
On the other hand, if you just need to set up relatively quickly a working kubernetes cluster, Kubernetes the Hard Way won't be your choice. Fortunatelly there are a few solutions that give you possibility to set up your kubernetes cluster relatively quickly and simply.
One of them is Minikube.
The other one which gives you possibility to set-up multi-node kubernetes cluster is kubeadm.
Going back to NixOS, I'm really impressed by how simple it is to set up your kubernetes cluster on this system, provided everything works as expected. But what if it doesn't ( and this is mainly what your question was about ) ? You may try to debug it on your own and try to look for a workaround of your issue or simply create an issue on NixOS project github page like this one. As you can see someone already reported exactly the same problem as yours. They say that on the 18.09 release it works properly so probably you're using newer version like 19.03. You can further read that there were some major changes like moving to mandatory pki in 19.03.
Take a closer look at this issue if you're particularly interested in running kubernetes on NixOS as there are a few advices and workarounds described there:
https://github.com/NixOS/nixpkgs/issues/59364#issuecomment-485122860
https://github.com/NixOS/nixpkgs/issues/59364#issuecomment-485249797
First of all make sure that your masterAddress is set properly i.e. as hostname, not ip address. As you put there only "XXXXXX" I can't guess what is currently set there. It's quite likely that when you set it e.g. to localhost, appropriate certificate would be generated properly:
services.kubernetes = {
roles = ["master"];
masterAddress = "localhost";
};
You may also want to familiarize with this info in NixOS docs related with Kubernetes.
Let me know if it helped.

Related

CodeReady Containers: developer doesn't have permissions to see the Topology

In layman's/beginner's terms, how do you give the developer permissions to the Topology section and whatever else they need, to perform general development with CodeReady Containers, as described in the instructions?
I'm looking into using OpenShift, which initially looked promising. However Minishift and CodeReady Containers have both had issues working on my Windows machine, so now I'm trying to use CodeReady Containers on a Cent OS system.
After setting up CRC and opening the console in Firefox, I select the htpasswrd_provider option and log in as developer/developer, which is exactly what Red Hat's instructions/tutorials say to do. And as they recommend, I created a new project, then switched to the Developer view and tried going into the Topology section.
There I get this error:
deploymentconfigs.apps.openshift.io is forbidden: User "developer" cannot list resource "deploymentconfigs" in API group "apps.openshift.io" in the namespace "default"
Rolls eyes. This is contrary to their own directions. Furthermore Red Shift's tutorials act like the developer is initially the only user available, although the username itself can be admin, developer, or whatever else you want it to be. So this user really just needs to be able to work.
On a personal note, I have spent a lot of time researching containers and container orchestration the last few months, having come from a Windows/.NET background, and I am sick and tired of every "simplification" of Kubernetes and other, related technologies failing to work as described, having to be troubleshot and "repaired" straight out of the box. I have Googled this, but have spent enough time sifting through haystacks troubleshooting issues like this, and to some extent, I feel like this software and/or its documentation is just broken.
In layman's/beginner's terms, how do you give the developer permissions to the Topology section and whatever else they need, to perform general development with CodeReady Containers, as described in the instructions?

Liferay 6.2 Lucene replication in cluster

I'd welcome any help regarding simple issue: I have clustered environment and I enabled Lucene replication in properties (lucene.replicate.write=true). Now, all the tutorials are instructing me to reindex Lucene.
Should I run it on one node? On both? Simultaneously or sequentially?
This question has been asked in Liferay Forum as well: https://www.liferay.com/community/forums/-/message_boards/view_message/69175435.
Thank you!
Basically what I did at first was following:
cluster.link.enabled=true
lucene.replicate.write=true
and the result was NOT WORKING replication.
What I tried next was to overcome this issue and continue with clustering the rest of the portal which at the end helped lucene as well. My progress was to:
deploy cluster activation keys
deploy ehcache-cluster-web.war
portal-ext.properties:
cluster.link.enabled=true
cluster.link.autodetect.address=<COMMONLY_ACCESSIBLE_IP_AND_PORT>
lucene.commit.batch.size=1
lucene.commit.time.interval=5000
lucene.replicate.write=true
ehcache.cluster.link.replication.enabled=true
cluster.link.channel.properties.control=<PATH_TO_XML>
cluster.link.channel.properties.transport.0=<PATH_TO_XML>
portal.instance.protocol=http
portal.instance.http.port=8080
setenv.sh
-Djava.net.preferIPv4Stack=true
-Djgroups.bind_addr=<IP_OF_THE_NODE>
edit clusterlink_control and clusterlink_transport files by Liferay tutorials
when servers shutted down delete contents of data/lucene and in Control Panel run reindaxation on one node
At the end, Lucene replication IS WORKING. What I think could be significant is following. At first, portal.properties explanation on keys lucene.commit.* is kind of hard to comprehend. By trial and error I found out that these two keys are in AND relation. Also, I found out about portal.instance.* keys which are used for multiple purposes in clustering and can matter if you have loadbalancers and/or Apaches between the nodes and autodetect fails.
There are multiple ways to configure search clustering in Liferay. If you use the lucene.replicate.write=true way, you're looking at several reindexing runs: On every restart of a server you must reindex that server's documents, as it might have missed indexing requests when it was down.
So, short answer: Don't worry, reindex both. Sooner or later you'll do it anyways, no matter if you need only one now.

Ubuntu + PBS + Apache? How can I show a list of running jobs as a website?

Is there a plugin/package to display status information for a PBS queue? I am currently running an apache webserver on the login-node of my PBS cluster. I would like to display status info and have the ability to perform minimal queries without writing it from scratch (or modifying an age old python script, ala jobmonarch). Note, the accepted/bountied solution must work with Ubuntu.
Update: In addition to ganglia as noted below, I also looked that the Rocks Cluster Toolkit, but I firmly want to stay with Ubuntu. So I've updated the question to reflect that.
Update 2: I've also looked at PBSWeb as well as MyPBS neither one appears to suit my needs. The first is too out-of-date with the current system and the second is more focused on cost estimation and project budgeting. They're both nice, but I'm more interested in resource availability, job completion, and general status updates. So I'm probably just going to write my own from scratch -- starting Aug 15th.
Have you tried Ganglia?
I have no personal experience but few sysadmin I know are using it.
Following pages may help,
http://taos.groups.wuyasea.com/articles/how-to-setup-ganglia-to-monitor-server-stats/3
http://coe04.ucalgary.ca/rocks-documentation/2.3.2/monitoring-pbs.html
my two cents
Have you tried using nagios: http://www.nagios.org/ ?

AppFabric caching's local cache isnt working for us... What are we doing wrong?

We are using appfabric as the 2ndlevel cache for an NHibernate asp.net application comprising a customer facing website and an admin website. They are both connected to the same cache so when admin updates something, the customer facing site is updated.
It seems to be working OK - we have a CacheCLuster on a seperate server and all is well but we want to enable localcache to get better performance, however, it dosnt seem to be working.
We have enabled it like this...
bool UseLocalCache =
int LocalCacheObjectCount = int.MaxValue;
TimeSpan LocalCacheDefaultTimeout = TimeSpan.FromMinutes(3);
DataCacheLocalCacheInvalidationPolicy LocalCacheInvalidationPolicy = DataCacheLocalCacheInvalidationPolicy.TimeoutBased;
if (UseLocalCache)
{
configuration.LocalCacheProperties =
new DataCacheLocalCacheProperties(
LocalCacheObjectCount,
LocalCacheDefaultTimeout,
LocalCacheInvalidationPolicy
);
// configuration.NotificationProperties = new DataCacheNotificationProperties(500, TimeSpan.FromSeconds(300));
}
Initially we tried using a timeout invalidation policy (3mins) and our app felt like it was running faster. HOWEVER, we noticed that if we changed something in the admin site, it was immediatley updated in the live site. As we are using timeouts not notifications, this demonstrates that the local cache isnt being queried (or is, but is always missing).
The cache.GetType().Name returns "LocalCache" - so the factory has made a local cache.
Running "Get-Cache-Statistics MyCache" in PS on my dev environment (asp.net app running local from vs2008, cache cluster running on a seperate w2k8 machine) show a handful of Request Counts. However, on the Production environment, the Request Count increases dramaticaly.
We tried following the method here to se the cache cliebt-server traffic... http://blogs.msdn.com/b/appfabriccat/archive/2010/09/20/appfabric-cache-peeking-into-client-amp-server-wcf-communication.aspx but the log file had nothing but the initial header in it - i.e no loggin either.
I cant find anything in SO or Google.
Have we done something wrong? Have we got a screwy install of AppFabric - we installed it via WebPlatform Installer - I think?
(note: the IIS box running ASp.net isnt in yhe cluster - it is just the client).
Any insights greatfully received!
Which DataCache methods are you using to read from the cache? Several of the DataCache methods will always make a hit against the server regardless of local cache being configured. You pretty much have to make sure you only use Get if you want the local cache to be leveraged.
This is one my biggest nits with AppFabric Caching. They don't explain any of this to you and so when you begin to rely on local caching you begin to fall into these little pitfalls because you do not think you're paying a penalty for talking to the service, transferring data over the wire and deserializing objects, but you are.
The worst thing is, I could understand having to talk to the service to make sure the local cache represents the latest data. I can even understand transferring the data back so that multiple calls are not made. What I can not understand for the life of me though is that even if the instance in the local cache is verified to still be the current version that came back from the cache, they still deserialize the object from the wire rather than just returning the instance that's in memory already. If your objects are large/complex this can hurt a lot.
After days and days of looking into why we get so many Local Cache misses we finally solved it:
There is a bug with local cache in AppFabric v 1.1 that is fixed in CU4, see http://support2.microsoft.com/kb/2800726/en-us
Make sure that the Microsoft.ApplicationServer.Caching.Client.dll used by your application is also updated. We had CU4 installed on the machine but got the Client.dll without CU4 from a NuGet package in our application. In our case a simple NuGet package update made everything work.
After installing CU4 and making sure that the Client.dll was also updated we reduced our reads towards the AppFabric Host by a lot, due to Local Cache hits increasing. yay!
Have you tried using a nhibernate profiler? http://nhprof.com/
There is also this:
http://mdcadmintool.codeplex.com/
It's a nice way to manage and view the cache.
Both of these may help in debugging the issue.

Best IT/back-office system hacks? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Lots of people have things that their systems do for them or for their teams. Source control post-commit hooks are a standard example: have an automated build system that checks out the latest source, compiles, tests, and packages it is a back-office hack that most of us probably use.
What other cool things have you done?
We had one developer in our team who wasn't familiar with the concept of a subversion conflict. He deduced that if he simply deleted all that weird stuff in his code and clicked resolve that everything was ok (i.e. knocking out all the other changes in the file....)
Regardless to say, after the 5th time this occurred, and the 5th time that I had to explain why that defect that I just closed was reoccuring, I wrote a script.
It would diff for the changes to a file to see whether the consecutive checkin deleted all the previous changes and that they were done by the nameless developer.
It would then send an email to the boss with a description of what happened, and how much work was lost during the checkin.
There was no 7th occurrence.
We have a traffic-light that shows whether our daily build succeeds, has failed tests or simply doesn't build.
Also, we have a light bar that lights up for a few seconds whenever we receive an upload from a customer.
We aren't staffed 24x7 but we have critical processes that run throughout the night. We created an in-house alerts system to notify us of serious system issues, failed mission-critical processes, etc. It uses text-to-speech to create a descriptive message and then connects to our automated dialer to call the appropriate people with the message.
Working at a web design company I configured our dev server so we could see a working copy of a project in real time by a sub domain name. So if your name was joe and you were working on project jetfuel you would go to joe.jetfuel.test-example.com and you could see your changes instantly without committing.
This was a simple hack that used sub domain names as a partial directory structure. Our htdocs path looked like this htdocs/tag/project. We had a script (a php app that you would access by setup.test-example.com) that would create a new tag name for you and checkout whatever version you wanted and call the deploy script for that project. If it succeeded it would forward you to the new sub domain. You could then work on this new copy by a samba share.
This worked really well for us since we always deployed to the same linux build and our projects had simple database requirements.
Our original reason for doing this was because our developers worked on all kinds of different platforms. Besides fixing this platform problem this was awesome for viewing changes and testing. We had all kinds of tags ranging from peoples names, trunk versions, test tags, all the way to prototypes like jquery-menu-hack.jetfuel.test-example.com
Now that I look back I wonder how much easier it would have been to run virtual machines.
We had a dev working on a classic ASP site that didn't believe in source control. The code went from his machine straight to the production box. This lead to issues with lost changes or the inability to revert back to a stable version. Since CruiseControl.Net has the ability to monitor a directory, I added a project that actually checked in files whenever they were copied to production. Completely backward from CC.Net's original intent, but we didn't lose any more code.
Put in a pre-commit hook that checks the bug comment refers to an open bug, assigned to the user doing the checkin. (SCMBug can do this).
Then to make life REALLY interesting, spell check the comments!!
The commit comment, and the one in the code. (spell is my buddy)
Run the code through a code formatter set to compayn standard; and diff it to the original: if it's not in company offical format: reject the commit.
Do a coverage test with the unit test build.
Email all mistakes/errors caused to the development team.
I left OUT the name of the developer. They know they did it.
Not exactly hacks, but a couple of must-haves for IT dev work:
If you're using subversion, you've got to use CommitMonitor. (http://tools.tortoisesvn.net/CommitMonitor) It lets you monitor svn repositories for new commits & then review the new commits. Great if you're wanting to stay on top of what your team is doing. Particularly if you have a couple of juniors that need to be watched. ;)
Rsnapshot (http://www.rsnapshot.org/) is also invaluable - we have complete backup snapshots of our entire filesystem every four hours going back 2 years, and every day beyond that. It's like a data cube for your filesystem! The peace of mind this gives is pure bliss. :)
Hardly a hack, but back in the day, on our speedy VAX 11/730, our overnight process would print the file "BLAMMO.TXT" on the printer if something went amiss. Every morning, the first stop was the printer when coming in.
Back in the dotCom days about 9 years ago, I had to hack a failover system between two different locations. We had a funky setup with a powerbuilder front end website, and powerbuilder managment tool. Data was stored in MSSQL 7.0. The webservers used IPX to communicate to the SQL Servers (don't ask). Anyway, I was responsbile for coming up with a failover plan.
I ended up hacking together some linux boxes, and had them run our external DNS. One at each location. We had a remote site w/ webserver, and sql server I got SQL transaction replication working over a 128k ISDN IPX connection (of all things). Then built a monitoring tool at our production site to send packets out to various upstream network handoffs. If we experienced more than 20% outage the primary site, the monitoring tool ran a perl script on the Debian box to change DNS and point to our 2ndary. Our secondary had a heartbeat w/ our primary DNS, and monitoring station. It would duplicate records unless it lost both connections then it would roll over to pointing DNS to backup location.
The primary site would shut down the SQL server at the primary location to break replication. Automated site to site failover using 128k ISDN IPX connection :)
Back at my previous job, we had to audit many tables for data changes (inserts, updates and deletes). Our support crew had to be able to search through this data to find changes that users made.
The temporary solution that had become semi-permanent was to store each non-select query. However this was a large system, that the table would grow by about 1.5GB a day.
The solution I came up with was to create a script that for all tables in an external list, created the appropriate triggers that audit each table, row, column, before and after, when and by whom and store it in our new audit table. This table grew by about 10% the size of the older version and stored much more usable data. It enabled us to create a UI to search and view every change made to our data, without requiring any knowledge of SQL for our support team or business users.
This is at a lesser level, but I am fairly proud of a make file I wrote for compiling code for my research. It only needs to be given your source and header file names that can take care of the rest all by itself (though it does make the one assumption that you will not be compiling any header files into objects, only source files get compiled). The other downsides are the fact that it relies on the GNU make program's second expansion feature, so I don't know if it works on other make programs. Additionally the compiler used needs to support something similar to gcc's -MM feature. Here is hoping that no one laughs at it.
-include prereqs.mk
HEADERS=$(SRC_DIR)/gs_lib.h $(SRC_DIR)/gs_structs.h
SOURCES=$(SRC_DIR)/main.cpp $(SRC_DIR)/gs_lib.cpp
OBJECTS=$(patsubst $(SRC_DIR)/%.cpp,$(OBJ_DIR)/%.o,$(SOURCES))
release: FLAGS=$(GEN_FLAGS)$(OPT_FLAGS)
release: $(OBJECTS) prereqs.mk
$(CXX) $(FLAGS) $(LINKER_FLAGS) $(OUTPUT_FLAG) $(EXECUTABLE) $(OBJECTS)
prereqs.mk: $(SOURCES) $(HEADERS)
$(CXX) $(DIR_FLAGS) $(MAKE_FLAG) $(SOURCES) | sed 's,\([abcdefghijklmnopqrstuvwxyz_]*\).o:,\1= \\\n,' > $#
.SECONDEXPANSION:
$(OBJECTS): $$($$(patsubst $(OBJ_DIR)/%.o,%,$$#))
$(CXX) $(FLAGS) $(NO_LINK_FLAG) $(OUTPUT_FLAG) $# $(patsubst $(OBJ_DIR)/%.o,$(SRC_DIR)/%.cpp,$#)
Obviously I dropped the definition of a number of variables, but I think it gets the idea across.
Since my coding tools and style are compatible with the requirements of this script I like to use it. All I need to do to add (a) new piece(s) of source code is add its name(s) to the appropriate variable and the rest is taken care of.
We have Twitter accounts for many projects which tweet things like commit messages, notices from builds, failed unit tests, deployments, bug tracking activity - any kind of event associated with the project. Running a client like Twitter Gwibber (which displays a pop-up for each new status) is a great way to stay in touch with the activity on the projects you are interested. Using Twitter is good as you can take advantage of all the 3rd party apps - such as the iPhone clients.
Add commit-hook check for VRML/3d-model files with absolute path to textures/images. f:/maya/my-textures/newproject/xxxx.png just doesn't belong on the server.
Back in the 1993, when source control systems were really expensive and unwieldy, the company I worked about had an in-house source control built as 4DOS scripts. It wasn't as sofisticated as most current source control systems, for example it didn't have branching or integrates, but it did the basic job of supporting revisions history, checkout/checkin and rudimentary conflict resolution.