Now that Lemmy 0.19.0 has been out for a few days, we will be proceeding with the update here on Lemmy.tf. I am tentatively planning to kick this off at 4pm EST today (3.5 hrs from the time of this post).
All instance data will be backed up prior to the update. This one will include a handful of major changes, the most impactful being that any existing 2FA configurations will be reset. Lemmy.ca has a post with some great change info - https://lemmy.ca/post/11378137
I can post this because I'm still signed in on jerboa but I can't log in on my browser or reset my password.
My password is only 8 characters and I can't even click login without typing 10 characters and when I go to forgot password and click reset password I never get an email to reset it.
I'm having the same problem on another instance but I can't even use jerboa with that one.
I had the same problem on lemmy.world but password reset worked there.
I wanted to sign up for this instance but it gave me an email not sent error and is now telling me my email already exists on this instance. I haven’t gotten a verification email and I can’t log in. Am I just locked out of signing up now?
I noticed some timeouts and DB lag when I logged in early this afternoon, so I have gone ahead and updated the instance to 0.18.4 to hopefully help clear this up.
As I'm sure everyone noticed, the server died hard last night. Apparently, even though OVH advised me to disable proactive interventions, I learned this morning that "the feature is not yet implemented" and that they have proceeded to go press the reset button on the machine every time their shitty monitoring detects the tiniest of ping loss. Last night, this finally made the server mad enough not to come back up.
Luckily, I did happen to have a backup from about 2 hours before the final outage. After a slow migration to the new DC, we are up and running on the new hardware. I'm still finalizing some configuration changes and need to do performance tuning, but once that's done our outage issue will be fully resolved.
Issues-
[Fixed] Pict-rs missing some images. This was caused by an incomplete OVA export, all older images were recovered from a slightly older backup.
[Fixed?] DB or federation issues- seeing some slowness and occasional errors/crashes due to the DB timing out. *
(I'm creating a starting guide post here. Have patience, it will take some time...)
Disclaimer: I am new to Lemmy like most of you. Still finding my way. If you see something that isn't right, let me know. Also additions, please comment!
Welcome!
Welcome to Lemmy (on whichever server you're reading this)
About Lemmy
Lemmy is a federated platform for news aggregagtion / discussion. It's being developed by the Lemmy devs: https://github.com/LemmyNet
About Federation
What does this federation mean?
It means Lemmy is using a protocol (Activitypub) which makes it possible for all Lemmy servers to interact.
You can search and view communities on remote servers from here
You can create posts in remote communities
You can respond to remote posts
You will be notified (if you wish) of comments on your remote posts
You can follow Lemmy users/communities on other platforms that a
So after a few days of back and forth with support, I may have finally received some insight as to why the server keeps randomly rebooting. Apparently, their crappy datacenter monitoring keeps triggering ping loss alerts, so they send an engineer over to physically reboot the server every time. I was not aware that this was the default monitoring option on their current server lines, and have disabled it so this should avoid forced reboots going forward.
I am standing up a basic ping monitor to alert me via email and SMS if the server actually goes down, and can quickly reboot it myself if ever needed (may even write some script to reboot via API if x concurrent ping fails, or something). Full monitoring stack is still in progress but not truly necessary to ensure stability at the moment.
OVH has scheduled a maintenance window for 5:00 EST this evening, hopefully they will be able to pinpoint the fault and get parts replaced at the same time. This will likely be an extended outage as they have more diagnostics than I was able to run, so I would expect somewhere around an hour or two of downtime during this.
I am mildly tempted to go ahead and migrate Lemmy.tf off to my new environment but it would incur even more downtime if I rush things, so it'll have to be sometime later.
Update 7:30PM:
I just received a response on my support case, they did not replace any hardware and claim their own diagnostics tool is buggy. We may be having a rushed VM migration over to a new server in the next few days... which would incur a few hours of hard downtime to migrate over to the new server (and datacenter) and switch DNS. Ideally I'd prefer to have time to plan it out and prep for a seamless cutover but I think a few hours of downtime over the weekend is worth ending the random
Support is getting a window scheduled for their maintenance. I've asked for late afternoon/early evening today with a couple hours advance notice so I can post an outage notice.
===========
UPDATE 12:00AM:
Diagnostics did in fact return with a CPU fault. I've requested they schedule the downtime with me but technically they can proceed with it whenever they want to, so there's a good chance there will be an hour or so of downtime whenever they get to my server- I'll post some advance notice if I'm able to.
===========
As I mentioned in the previous post, we appear to have a hardware fault on the server running Lemmy.tf. My provider needs full hardware diagnostics before they can take any action, and this will require the machine to be powered down and rebooted into diagnostics mode. This should be fairly quick (~15-20mins ideally) and since it is required to determine the issue, it needs done ASAP.
I will be taking everything down at 11:00PM EST tonight to
EDIT 07/24:
This is an ongoing issue and may be a hardware fault with the machine the instance is running on. I've opened a support case with OVH to have them run diagnostics and investigate. In the meantime I am getting a Solarwinds server spun up to alert me anytime we have issues so I can jump on and restore service. I am also looking into migrating Lemmy.tf over to another server, but this will require some prep work to avoid hard downtime or DB conflicts during DNS cutover.
==========
OP from 07/22:
Woke up this morning to notice that everything was hard down- something tanked my baremetal at OVH overnight and apparently the Lemmy VM was not set to autostart. This has been corrected and I am digging into what caused the outage in the first place.
I know there is some malicious activity going on with some of the larger instances, but as of this time I am not seeing any evidence of intrusion attempts or a DDoS or anything.
## What is Lemmy? Lemmy is a self-hosted social link aggregation and discussion
platform. It is completely free and open, and not controlled by any company.
This means that there is no advertising, tracking, or secret algorithms. Content
is organized into communities, so it is easy to subscribe to t...
Link Actions
Lemmy 0.18.1 dropped yesterday and seems to bring a lot of performance improvements. I have already updated the sandbox instance to it and am noticing that things are indeed loading quicker.
I'm planning to upgrade this instance sometime tomorrow evening (8/9 around 6-7pm EST). Based on the update in sandbox, I expect a couple minutes of downtime while the database migrations run.
I'm running the Lemmy Community Seeder script on our instance to prepopulate some additional communities. This is causing some sporadic json errors on the account I'm using with the script, but hopefully isn't impacting anyone else. Let me know if it is and I'll halt it and schedule for late-night runs only or something.
Right now I have it watching the following instances, grabbing the top 30 communities of the day on each scan.
I've been stalling on this but need to get some form of community rules out with the added growth from the Reddit shutdown. These will likely be tweaked a bit going forward but this is a start.
Rules
Be respectful of everyone's opinions. If you disagree with something, don't resort to inflammatory comments.
No abusive language/imagery. Just expanding on #1.
No racism or discrimination of any kind.
No advertising.
Don't upload NSFW content directly to the instance, use some third party image host and link to that in your posts/comments.
Mark any NSFW/erotic/sensitive/etc posts with the NSFW tag. Any local posts violating this rule are subject to removal (but you can repost correctly if this happens).
Hold the admins/mods accountable. If we start making changes that you disagree with, please feel free to post a thread or DM us to discuss! We want this instance to be a good home for everyone and welcome feedback and discussion.
Obviously with the closing of the Reddit API all of our favourite apps no longer work and you've ended up here looking for alternatives. Lemmy in general isn't Reddit, it's just similar and so it will take a minute to adapt, but as with all communities, it's you that makes it. Thanks especially for choosing Lemmy.tf as your instance. It's nice, speedy and well maintained. I look forward to seeing you all around. Have fun and if you have any questions, feel free to ask.
So Lemmy 0.18.0 dropped today and I immediately jumped on the bandwagon and updated. That was a mistake. I did the update during my lunch hour, quickly checked to make sure everything was up (it was, at the time) and came back a few hours later to everything imploding.
As far as I can tell, things broke after the DB migrations occurred. Pict-rs was suddenly dumping stack traces on any attempt to load an image, and then at some point the DB itself fell over and started spewing duplicate key errors in an endless loop.
I wound up fiddling with container versions in docker-compose.yml until finding a fix that restored the instance. We are downgraded back to the previous pict-rs release (0.3.1), while Lemmy and Lemmy-UI are both at 0.18.0. I'm still trying to figure out what exactly went wrong so I can submit a bug report on Github.
Going forward, I will plan updates more carefully. We will have planned maintenance windows posted at least a few days in advance, and I may look into migrat