My Colocation Deployment

How this Started

Hosting this site along with a few others, I have come to realize that 5mbps of upload bandwidth is not enough for web hosting.  This site’s home page sometimes took as much as 23 seconds to fully load.  This was completely unacceptable to me, as I am also hosting a website for my DJ business, and need that one to load as quickly as possible.  So back in late September or early October, I made a post on r/homelab asking for people’s recommendations for cheap 1u colocation.  I had a Poweredge R330 that I wanted to run with more WAN bandwidth, and highly available WAN and power.  I figured this would get me the faster loading times and better uptime I was looking for.  Someone reached out and told me that they had a few rack spaces available in their colocation rack that they weren’t using, and were willing to rent out a space to me  for a reasonable price.  We discussed logistics and agreed that when my hardware was properly configured, I would send the system out to him to install.

Configuring the Hardware

The first thing I did was grab the latest release of ProxMox VE and install it to a flash drive inside the R330.  I thought this was the best way to run it because it would limit wear on the single SSD I was going to install for VM storage.  This soon became an issue because they system didn’t like booting off of the flash drive.  I had issues where the volume wouldn’t mount in time and the drive mounting timed out during boot.  I temporarily solved this by setting a higher rootdelay in the GRUB configuration, but the issue continued after a week or so.  Eventually I gave up and reinstalled PVE on the SSD.  I test booted the system about 25 times, and it came up every time with no reconfiguration of the GRUB file.  This ended up being for the best because my R710 that booted PVE from an internal flash drive started having issues related to the age of the flash drive (about two years old) around this time.

I configured my networks and internal pfSense with the WAN information provided to me and the LAN settings required to make it a part of my existing network infrastructure, including adding an OpenVPN client to bridge the colocation LAN to my existing home LAN.  I also setup a Ubuntu desktop VM that I could use to access the pfSense web UI if I screwed up the network settings at some point  (I have a separate management LAN that I use for the ProxMox VE management interface and the iDRAC interface.  My pfSense router at home maintains a site-to-site VPN so I can always access these management interfaces).

Shipping

When I was done with all of the configuration and testing (making sure everything came up correctly on reboots, verifying network connectivity, making sure the drive was healthy…) I wrapped the server in anti-static bubble wrap.  After using nearly the entire roll, I boxed the system up with the power and network cables and rails.  I used almost an entire roll of packing tape to make sure the box didn’t open in transit, and took it to the UPS store for shipping.  For two weeks, I checked the shipping tracker daily waiting to see when my machine had safely arrived.  When it did, I was finally able to let out a sigh of relief.  The system was installed the day it arrived, and I was ecstatic to see the virtual pfSense pop up in the OpenVPN connected clients log.

Getting Setup at the New Datacenter

I tried pinging the system, the router, and accessing the management interfaces to verify everything was working,  and it was.  Well, almost everything was working.  I was unable to log into my iDrac, requiring a KVM module to be connected to the server so I could reset it. Once that was taken care of, I was able to migrate my WordPress installations and relevant MySQL databases over, along with my authoritative DNS server.  A new Traefik server proxies all incoming HTTP and HTTPS requests.  I was able to setup a couple new services, like GitLab (self-hosted GitHub, a system for collaborative software development), and a backup FreePBX server for my home telephone system (if the main PBX goes down, the phones should switch to the backup PBX for inbound and outbound calls).

Conclusion

So far, I could not be happier with the performance and reliability of this new Colocation setup.  I highly recommend it if you are trying to manage your own hosting but require uptime and bandwidth that you just can’t get at home.

Bringing In The New Year With a Cluster of Network Problems

I always seem to have issues, particularly with my network, in clusters.

I caught the flu New Year’s Eve, and was out of service for about a week. During the first day or two of being sick, my R710 running Proxmox and all of my VMs had an issue and kernel panicked. I didn’t get any details as to what the exact issue was because I was too out of it.  I did try to restart it, but had no luck.  It never managed to fully boot.  While I was sick, I was able to re-install Proxmox to a new RAID 1 array (PVE was previously installed on a a flash drive, and I think that had something to do with the problem) and restore all of my backed up VMs. I was still pretty out of it while I did this, but everything worked fine after and I was relieved that everything was working again – home-assistant was controlling all of the outside lights, the telephone system was working, and the websites I host were back up.

Around this time the server I shipped off to colocation was installed and I was looking forward to getting services moved over before I had another issue with my infrastructure at home. This couldn’t happen soon enough. The next day (Saturday) I was feeling better, and the universe decided to test just how much better I was feeling. Sometime around one pm, the power went out. I got the generator up and running within a couple of minutes, but found out that three of my four UPS units do not run on generator power. After about twenty minutes, I had to power down the newly-rescued Proxmox server, and the file server with over 180 days of uptime. I was not happy about this.  My plan was to work on migrating services over to the newly installed colocation server, but I couldn’t do this if the primary server was down.  With the power out and most of the network down, I worked on cleaning up my cabling. I worked on the cabling in the back for an hour or so and when the power came back on, the back looked a bit better. I watched as everything came back online for the second time in two days, and once everything was working, I thought that I wouldn’t have to deal with this issue again for a while, as the R710 used to be very stable.

Everything was stable Sunday, so I thought I was in the clear.

The following day (Monday), I decided to spend another hour or so in the lab and work on cleaning up the cabling for the client network. I didn’t take a before picture, but it definitively looked a mess. I’m pretty happy with the way it came out. Again everything seemed stable so I thought I was in the clear – the cluster of issues was over.

Rack photo - January 2019
My rack as of January 2019

Nope. I woke up on Tuesday with devices having a hard time connecting to wifi, or not connecting at all, and my IP phones were showing as unregistered. I went to the lab and saw that the R710 was completely off. Looking down, the UPS that powers it was completely off. I have no idea what could have caused this. The cats can’t turn it off because they can’t hold the power button, but something weird must have happened. I don’t see what would have caused this. Regardless, I turned it back on and watched all of my services come back online for the third time in a week. Now on to the WiFi issue. Devices either taking a long time to connect, or not connecting at all. I looked at the UniFi dashboard and saw that one of my APs was showing as disconnected from the UniFi controller. I disconnected this one and the WiFi issues seemed to stop. A bit later I thought to try connecting the offending AP to a different switch port and the issue went away, so I must have connected it to a port configured for something weird when I cleaned up the client network cabling the previous day.  Fortunatly the cluster seems to be over now and everything is running smoothly.  Fingers crossed it stays that way.

Lab January 2019 Update

Rack photo - January 2019
My rack as of January 2019

I came down with the flu last week, so towards the end when I was getting better, I had some time to working on cabling in the rack.

 

Most of the differences here from before are just cleaning up the cabling, adding a couple of UPSs, and the addition of a zwave stick for home assistant.

 

The R710 runs:

  • WordPress (not for long)
  • Tekkit
  • BIND Authoritative
  • Accounting (Custom written)
  • Radius (Not working yet)
  • Unifi
  • Lime Survey
  • Apt Cache
  • Bitwarden
  • Transmission
  • Simple Invoices
  • Nginx Reverse Proxy
  • Apache Server with various applications including Nextcloud
  • FreeIPA
  • email
  • FreePBX
  • UPS Monitor
  • MySQL
  • Home Assistant

The 2950 Runs:

  • Plex
  • Samba
  • Netatalk
  • NFS

The Dimension E310 Runs:

  • pfSense

Migrating LDAP Servers With Nextcloud/ Owncloud

Those of you who have seen any of my previous posts know that I have an arsenal of PowerEdge 2950s.  I am trying to move away from the 2950s for the purpose of power efficiency and have been consolidating all of my VMs and Docker containers to one Dell R710 running Proxmox.  Most of the services were an easy move, as the migration only involved sliding over a Virtual machine and reconfiguring the network adapter.  There are two major exceptions to this, one being the MySQL server (which is currently running as a docker container), and the other is the LDAP server.  The LDAP server migration isn’t really a problem on it’s own, but the fact that I am going to be using FreeIPA for SSO across my network is.  Basically, I needed to move my Nextcloud users from the existing LDAP server to the IPA server.

A quick search on Google turns up very little useful information.  The only thing I found was a post (which I can’t find anymore) that suggested it would be necessary to manually change some things in the “ldap_user_mapping” table in the database.  This is actually a pretty simple task, but it took me a while to figure out some of the FreeIPA specific LDAP settings in Nextcloud.  The first thing is to make sure the two “objectclass” references both equal “person”, and not “inetOrgPerson”.  One reference is under Users>Edit LDAP Query, and the second reference is under Login Attributes> Edit LDAP Query.  Those two settings kept me from getting this to work for a couple of hours.  The next step is to go to the Advanced>Directory Settings tab and make sure the “User Name Display Field” is set to “displayName”.  Finally, head over to the Advanced tab and set the Internal Username Attribute and both UUID Attribute boxes to “ipaUniqueID”.  This UUID is how Nextcloud keeps track of users.

The problem now is that your existing users, when logging in to the new LDAP server, will be presented with a new account.  This is not optimal if you already have calendars, contacts, and files already stored in your Nextcloud account.  The best way around this that I can tell is to login with the new user account so a new user account mapping is created, and to copy the old UUID to the new user.  Just make sure you change something on the old user, as the UUID field is the primary key for that table, meaning there can’t be records with the same UUID value.

Network Overhaul, and the Addition of an R710

My lab has been running pretty stable now for at least a solid year, so naturally it is time to make some changes.  I have some new things I want to experiment with that I just don’t have the flexibility for.  I have to completely overhaul my rack and everything in it, and I have some points that will hopefully make my compute environment more conducive to my compute goals and planned future experimentation.

Continue reading “Network Overhaul, and the Addition of an R710”

This Blog’s Infrastructure

The stack that runs this WordPress installation evolved over many months; from the beginning of 2017 to now, I have been fine-tuning my lab to accommodate a number of services and applications, including those needed to run this blog.  The whole process really started years ago when I first setup a home server, but that’s a topic for another post.  Here, I will give you the basic run down of how how this shit works.

Continue reading “This Blog’s Infrastructure”