Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - Stavros

Pages: [1] 2 3
1
Drivers / Re: addr data lost during connection
« on: July 23, 2014, 08:32:59 PM »
Got it.

Code: [Select]
diff --git a/src/comm.cc b/src/comm.cc
index 0754d70..7e83248 100644
--- a/src/comm.cc
+++ b/src/comm.cc
@@ -200,7 +200,7 @@ void async_on_accept(int new_socket_fd, port_def_t *port) {
   user->external_port = (port - external_port);  // FIXME: pointer arith

   user->addrlen = sizeof(user->addr);
-  getsockname(new_socket_fd, (sockaddr *)&user->addr, &user->addrlen);
+  getpeername(new_socket_fd, (sockaddr *)&user->addr, &user->addrlen);

   // network layer done, hand-off to main thread.
   add_realtime_event([=]() { new_user_handler(user); });

getsockname() retrieves the info for the *local* socket (i.e. the address and port of the server). getpeerinfo() retrieves the info for the *remote* host connected on that socket.

2
Drivers / Re: addr data lost during connection
« on: July 22, 2014, 10:58:19 PM »
Sorry, I wasn't very detailed in that last post - I was writing from my phone on my way to work. Here is the deal:

On Sunday evening, we upgraded to the latest driver revision (git commit 8eba32bf) on our live mud. We have run into a couple issues (including the go-ahead issue from my previous post), but the one I'm working on now has to do with the fact that when we run query_ip_number() on our playerbase to see what's up, everyone appears to be logged on from the mud server.

Using various debug levels, I've narrowed the problem down to one function: async_on_accept(), in comm.cc

I used "event" debugging (-devent as my driver argument), and when I connect to my mud server from an external IP, I get a debug message in on_external_port_event() (in event.cc), which indicates that the connection is coming from the proper IP address - that of the external host that is trying to telnet into the mud server.

Then I use "connections" debugging (actually just used -d as a driver option, which does connections and telnet debugging), and when I connect to the mud server from an external IP, I get a debug message in new_user_handler() (in comm.cc) which claims that the IP address of the new user is the same as the server's, even though it is not.

The function that gets called by on_external_port_event(), and then calls new_user_handler(), is async_on_accept().

Looking through async_on_accept(), I can't see any indication that the "addr" data structure (which apparently holds the IP address information) is passed from on_external_port_event() (the last known place where the correct IP data is held), and even if it is passed, I can't see any indication that it is copied into the newly-initialized "user" data structure.

However, I can't see any indication that *anything* is copied into user's addr data structure - as far as I can tell, it should be all zeros. But it shows up later with the server's IP address. So something must be happening behind the scenes that I don't understand.

So, here are the questions:

1. Can anyone confirm that, with the latest version of next-3.0, all users have the server's IP address when you use query_ip_number(), even when users are logged in from a separate host?

2. If (1) is confirmed, can anyone explain why the server's IP address shows up, even though nothing is apparently copied into user->addr? And can we get the users's actual IP address information properly copied into the interactive_t?

3. If (1) cannot be confirmed... why does my MUD look like the end of Spartacus?

3
Drivers / addr data lost during connection
« on: July 22, 2014, 09:24:22 AM »
Still working on figuring out exactly what is happening and how, but:

In async_on_accept(), as "user" (an interactive_t) is being populated, addr is not set.

Somehow, this translates to all users' IP showing up as the server's IP.

You can see this by doing -devent (shows correct IP in on_external_port_event ()), then -dconnections (shows server IP address in new_user_handler ()).

Further updates as I work on it later tonight.

Can anyone else confirm this problem? Should be easy enough... Compare the query_ip_number() of your players to your server IP.

4
Drivers / restoring go-ahead after write_prompt
« on: July 20, 2014, 10:22:32 PM »
When the switch was made to libtelnet (commit a901c645), one bit of code that was removed was sending a go-ahead (TELNET_GA) command after each prompt. A lot of clients use this command to differentiate prompts from other text. Here is a small patch to add it back in, but using the libtelnet API:

Code: [Select]
diff --git a/src/comm.cc b/src/comm.cc
index 0754d70..4e5c4a3 100644
--- a/src/comm.cc
+++ b/src/comm.cc
@@ -1371,6 +1371,9 @@ static void print_prompt(interactive_t *ip) {
   if (!IP_VALID(ip, ob)) {
     return;
   }
+  if ((ip->iflags & USING_TELNET) && !(ip->iflags & SUPPRESS_GA)) {
+    telnet_iac(ip->telnet, TELNET_GA);
+  }
 } /* print_prompt() */

 /*

5
Drivers / Re: FluffOS 3.0-alpha7.4
« on: July 14, 2014, 07:35:05 PM »
I've run into similar problems on my home workstation (running Arch Linux) where I have libevent_pthreads.so, but the config system doesn't detect it. I fixed it by adding -levent_pthreads and -levent to the LDFLAGS, and adding -levent_pthreads to the DRIVER_BIN target. You can do this by going into the Makefile and changing

Code: [Select]
LDFLAGS= -march=native  -O0 -g -gdwarf-2 -Wall -Weffc++ -pedantic -D_FORTIFY_SOURCE=2 -DDEBUG -DDEBUG_MACRO -std=c++11 -D_GNU_SOURCE -fno-omit-frame-pointer -flto  -flto -rdynamicto
Code: [Select]
LDFLAGS= -march=native  -O0 -g -gdwarf-2 -Wall -Weffc++ -pedantic -D_FORTIFY_SOURCE=2 -DDEBUG -DDEBUG_MACRO -std=c++11 -D_GNU_SOURCE -fno-omit-frame-pointer -flto  -flto -rdynamic -levent_pthreads -levent
and changing

Code: [Select]
$(CXX) $(LDFLAGS) $(OBJ) packages/*.$(O) `./dtrace_compile` -o $(DRIVER_BIN) $(LIBS)     -lssl -lcrypto   -leventto
Code: [Select]
$(CXX) $(LDFLAGS) $(OBJ) packages/*.$(O) `./dtrace_compile` -o $(DRIVER_BIN) $(LIBS)     -lssl -lcrypto   -levent -levent_pthreads
Your LDFLAGS and DRIVER_BIN line may be slightly different from mine, depending on what libs you have installed, etc. You should still be able to identify the lines from this and add the appropriate flags.

Also, you want to do this after you have run "./build.FluffOS" (otherwise you won't have a Makefile), but before you run make - if you have already run make, be sure to do a "make clean" before running make again after this fix.

As I said, this worked for me, but I can't guarantee it will work for you. I have no idea why the config script doesn't recognize event_pthreads on our systems, but I'm happy to run tests and provide any information necessary for anyone who wants to try to debug it. :)

UPDATE 10 minutes after posting:

I did this fix a long while ago... Now, looking at it with fresh eyes, I realized (1) there is a LIBS define that should be used for -l flags instead of LDFLAGS, and (2) you don't need to add -levent_pthreads to the DRIVER_BIN line. So just add -levent and -levent_pthreads to LIBS, and it compiles fine.

6
Drivers / Re: bug with call_out and shadows
« on: January 13, 2014, 03:25:26 PM »
Yes, definitely fixed, sorry I didn't follow up. We ran 3.0 alpha on our test mud for a while with the patch and had had no crashes. So we made the upgrade on our live mud a couple weeks ago, and everything's running great.

7
Drivers / Re: bug with call_out and shadows
« on: November 24, 2013, 02:47:41 PM »
Awesome, thanks for the tip re: SIGPIPE.

8
Drivers / Re: bug with call_out and shadows
« on: November 23, 2013, 11:42:41 AM »
Another issue we've run into (but which I'm not sure how to fix) is that in get_user_data() in comm.cc, in some circumstances, when num_bytes == -1 (line 1308), it will try to do remove_interactive(), which ends up trying to flush messages, but if the connection is gone (presumably one of the reasons num_bytes could be -1), it errors with "broken pipe" and the whole driver hangs.

I saw this in a backtrace, but I wasn't able to recreate it in any controlled situations. If I can figure out some LPC code that forces it, I'll post here.

Here's the backtrace, for the record:

Code: [Select]
#0  0xf57fe416 in __kernel_vsyscall ()
#1  0xb7730e61 in send () from /lib/i386-linux-gnu/libpthread.so.0
#2  0x080e14de in flush_compressed_output (ip=0xdfceb10) at comm.cc:2706
#3  0x080e13e2 in end_compression(interactive_s*) [clone .13871] (ip=0xdfceb10) at comm.cc:2648
#4  0x080e30d2 in remove_interactive (ob=0xe085c18, dested=0) at comm.cc:2101
#5  0x080eaa1a in get_user_data (ip=0xdfceb10) at comm.cc:1318
#6  0x080ea4b5 in on_user_read(int, short, void*) [clone .30107] (fd=23, what=2, arg=0xd59e510) at event.cc:146
#7  0xb74f6ce9 in event_base_loop () from /usr/lib/libevent-2.0.so.5
#8  0x081010fb in run_for_at_most_one_second (base=0xd4e9b58) at event.cc:66
#9  0x080e7c77 in backend (base=0xd4e9b58) at backend.cc:205
#10 0x080ecc17 in main (argc=2, argv=0xbff2a834) at main.cc:394

9
Drivers / bug with call_out and shadows
« on: November 23, 2013, 10:03:03 AM »
Here is one of the issues we've run into. Looks like an infinite loop in certain situations. I think this patch fixes it, but I'll leave it up to Fallentree to say for sure.  :)

Code: [Select]
diff --git a/src/call_out.cc b/src/call_out.cc
index ed1bd51..cbef2d1 100644
--- a/src/call_out.cc
+++ b/src/call_out.cc
@@ -170,9 +170,9 @@ void call_out(pending_call_t *cop)
   }
 
 #ifndef NO_SHADOWS
-  if (cop->ob)
-    while (cop->ob->shadowing) {
-      ob = cop->ob->shadowing;
+  if (ob)
+    while (ob->shadowing) {
+      ob = ob->shadowing;
     }
 #endif
   new_command_giver = 0;

10
Drivers / Re: memory and cpu use over time
« on: October 21, 2013, 07:32:43 PM »
When I run my mass object loader on our domains directory, it spikes the memory usage. (Obviously.)

Then I do "eval objects()->clean_up(0);" (several runs since there are more objects than the max array size), and eventually pare it down to a constant number of objects that are never supposed to unload.

At this point, the memory usage is quite a bit higher than at boot. Still to be expected, since all those additional no-unload objects are loaded.

Then I do the mass object load again, and then several more runs of clean_up()s, until I'm pared down to no-unload objects (same or similar number to before). But after this second round, my memory usage is significantly higher than after the first round.

Continuously doing these cycles of load/unload causes memory usage to climb, while the number of no-unload objects remains flat.

I'm planning to do more investigation: I haven't been able to run any tests to see what happens in the 2.x series, and I also haven't had a chance to look more closely at the lib code to see if there may be something in there that may be the culprit. I'll post results when I can find some free time.

11
Drivers / Re: memory corruption bug
« on: October 07, 2013, 05:26:06 AM »
Yeah, definitely a FD issue. lsof is how I found the issue in the first place. It topped out just above 1024, and i think the extra few were TCP connections, which presumably don't count toward ulimit.

12
Drivers / Re: memory corruption bug
« on: October 06, 2013, 05:53:50 PM »
Well, out of curiosity, I loaded up 2.24, and it has the exact same issue.

Here's the simplest way to re-create:

Code: [Select]
eval while(1) catch(new("/path/to/bad_obj"));
Where bad_obj is any object that throws an error on load. It doesn't always cause the error, but if you try it a couple times eventually the eval limit will happen during compilation, then the mud is fubar.

It happens in 2.24 and in 3.0, so that probably isn't what caused our mud to crash on 3.0. Which is a bit of a relief, since if it had, it would have meant our lib is in even worse shape than I thought.

The quest continues.

13
Drivers / Re: memory corruption bug
« on: October 06, 2013, 04:22:13 PM »
Looking more closely, the "Object cannot be loaded during compilation" error may be what our crash actually was.

When you try to load an object in this state, the driver keeps the file handle open for the file it's trying to compile. After a while, you end up too many open files, and the error handler itself errors out, which was the last thing I saw in the log after our crash.

Also, the symptoms of the problem are congruent: a player can actually log in, but gets no output on most commands.

It seems that this state is only reached *sometimes* when the eval limit is hit in the compiler, and I'm not sure what those sometimes are. I never actually experienced the error while in valgrind. I only found it when I tried doing some massive file loads running normally.

I'll try to dig a little more and see if I can figure it out.

14
Drivers / Re: memory corruption bug
« on: October 06, 2013, 03:13:00 PM »
I've been poking at it all weekend, but haven't been able to get it to crash.

The best (worst?) I've been able to do is get it stuck so that trying to load any unloaded object errors with "Object cannot be loaded during compilation." I'm assuming that means it hit the eval limit while in the middle of compiling an object, so it thinks it's there forever.

I'll keep playing around with it, but I won't be able to make much progress during the week.

15
Drivers / Re: memory corruption bug
« on: October 04, 2013, 04:51:42 PM »
That's tonight's project, I'll post here either way.

Pages: [1] 2 3