Bringing In The New Year With a Cluster of Network Problems

I always seem to have issues, particularly with my network, in clusters.

I caught the flu New Year’s Eve, and was out of service for about a week. During the first day or two of being sick, my R710 running Proxmox and all of my VMs had an issue and kernel panicked. I didn’t get any details as to what the exact issue was because I was too out of it.  I did try to restart it, but had no luck.  It never managed to fully boot.  While I was sick, I was able to re-install Proxmox to a new RAID 1 array (PVE was previously installed on a a flash drive, and I think that had something to do with the problem) and restore all of my backed up VMs. I was still pretty out of it while I did this, but everything worked fine after and I was relieved that everything was working again – home-assistant was controlling all of the outside lights, the telephone system was working, and the websites I host were back up.

Around this time the server I shipped off to colocation was installed and I was looking forward to getting services moved over before I had another issue with my infrastructure at home. This couldn’t happen soon enough. The next day (Saturday) I was feeling better, and the universe decided to test just how much better I was feeling. Sometime around one pm, the power went out. I got the generator up and running within a couple of minutes, but found out that three of my four UPS units do not run on generator power. After about twenty minutes, I had to power down the newly-rescued Proxmox server, and the file server with over 180 days of uptime. I was not happy about this.  My plan was to work on migrating services over to the newly installed colocation server, but I couldn’t do this if the primary server was down.  With the power out and most of the network down, I worked on cleaning up my cabling. I worked on the cabling in the back for an hour or so and when the power came back on, the back looked a bit better. I watched as everything came back online for the second time in two days, and once everything was working, I thought that I wouldn’t have to deal with this issue again for a while, as the R710 used to be very stable.

Everything was stable Sunday, so I thought I was in the clear.

The following day (Monday), I decided to spend another hour or so in the lab and work on cleaning up the cabling for the client network. I didn’t take a before picture, but it definitively looked a mess. I’m pretty happy with the way it came out. Again everything seemed stable so I thought I was in the clear – the cluster of issues was over.

Rack photo - January 2019
My rack as of January 2019

Nope. I woke up on Tuesday with devices having a hard time connecting to wifi, or not connecting at all, and my IP phones were showing as unregistered. I went to the lab and saw that the R710 was completely off. Looking down, the UPS that powers it was completely off. I have no idea what could have caused this. The cats can’t turn it off because they can’t hold the power button, but something weird must have happened. I don’t see what would have caused this. Regardless, I turned it back on and watched all of my services come back online for the third time in a week. Now on to the WiFi issue. Devices either taking a long time to connect, or not connecting at all. I looked at the UniFi dashboard and saw that one of my APs was showing as disconnected from the UniFi controller. I disconnected this one and the WiFi issues seemed to stop. A bit later I thought to try connecting the offending AP to a different switch port and the issue went away, so I must have connected it to a port configured for something weird when I cleaned up the client network cabling the previous day.  Fortunatly the cluster seems to be over now and everything is running smoothly.  Fingers crossed it stays that way.

Lab January 2019 Update

Rack photo - January 2019
My rack as of January 2019

I came down with the flu last week, so towards the end when I was getting better, I had some time to working on cabling in the rack.

 

Most of the differences here from before are just cleaning up the cabling, adding a couple of UPSs, and the addition of a zwave stick for home assistant.

 

The R710 runs:

  • WordPress (not for long)
  • Tekkit
  • BIND Authoritative
  • Accounting (Custom written)
  • Radius (Not working yet)
  • Unifi
  • Lime Survey
  • Apt Cache
  • Bitwarden
  • Transmission
  • Simple Invoices
  • Nginx Reverse Proxy
  • Apache Server with various applications including Nextcloud
  • FreeIPA
  • email
  • FreePBX
  • UPS Monitor
  • MySQL
  • Home Assistant

The 2950 Runs:

  • Plex
  • Samba
  • Netatalk
  • NFS

The Dimension E310 Runs:

  • pfSense

Migrating LDAP Servers With Nextcloud/ Owncloud

Those of you who have seen any of my previous posts know that I have an arsenal of PowerEdge 2950s.  I am trying to move away from the 2950s for the purpose of power efficiency and have been consolidating all of my VMs and Docker containers to one Dell R710 running Proxmox.  Most of the services were an easy move, as the migration only involved sliding over a Virtual machine and reconfiguring the network adapter.  There are two major exceptions to this, one being the MySQL server (which is currently running as a docker container), and the other is the LDAP server.  The LDAP server migration isn’t really a problem on it’s own, but the fact that I am going to be using FreeIPA for SSO across my network is.  Basically, I needed to move my Nextcloud users from the existing LDAP server to the IPA server.

A quick search on Google turns up very little useful information.  The only thing I found was a post (which I can’t find anymore) that suggested it would be necessary to manually change some things in the “ldap_user_mapping” table in the database.  This is actually a pretty simple task, but it took me a while to figure out some of the FreeIPA specific LDAP settings in Nextcloud.  The first thing is to make sure the two “objectclass” references both equal “person”, and not “inetOrgPerson”.  One reference is under Users>Edit LDAP Query, and the second reference is under Login Attributes> Edit LDAP Query.  Those two settings kept me from getting this to work for a couple of hours.  The next step is to go to the Advanced>Directory Settings tab and make sure the “User Name Display Field” is set to “displayName”.  Finally, head over to the Advanced tab and set the Internal Username Attribute and both UUID Attribute boxes to “ipaUniqueID”.  This UUID is how Nextcloud keeps track of users.

The problem now is that your existing users, when logging in to the new LDAP server, will be presented with a new account.  This is not optimal if you already have calendars, contacts, and files already stored in your Nextcloud account.  The best way around this that I can tell is to login with the new user account so a new user account mapping is created, and to copy the old UUID to the new user.  Just make sure you change something on the old user, as the UUID field is the primary key for that table, meaning there can’t be records with the same UUID value.

Network Overhaul, and the Addition of an R710

My lab has been running pretty stable now for at least a solid year, so naturally it is time to make some changes.  I have some new things I want to experiment with that I just don’t have the flexibility for.  I have to completely overhaul my rack and everything in it, and I have some points that will hopefully make my compute environment more conducive to my compute goals and planned future experimentation.

Continue reading “Network Overhaul, and the Addition of an R710”

This Blog’s Infrastructure

The stack that runs this WordPress installation evolved over many months; from the beginning of 2017 to now, I have been fine-tuning my lab to accommodate a number of services and applications, including those needed to run this blog.  The whole process really started years ago when I first setup a home server, but that’s a topic for another post.  Here, I will give you the basic run down of how how this shit works.

Continue reading “This Blog’s Infrastructure”