Ticket #120 (closed defect)

Opened 8 years ago

Last modified 8 years ago

libssh2_session_startup may hang

Reported by: anonymous Owned by: bagder
Priority: normal Milestone:
Component: API Version:
Keywords: Cc: bagder
Blocked By: Blocks:



Our application has been hanging in libssh2_session_startup quite a long time. It's probably a consequence of a
network issue (the connect() function worked so the network issue probablty happened between these two functions)
The application has been not responding for more than 60 seconds, and it has been restarted automatically by a monitor.

Fortunately a backtrace has been generated:
(gdb) bt
#0 0x00007ffeae99cce2 in select () from /lib/libc.so.6
#1 0x00007ffeaec3a458 in _libssh2_wait_socket (session=0x7ffea801a7d0) at session.c:520
#2 0x00007ffeaec3ad46 in libssh2_session_startup (session=0x7ffea801a7d0, sock=15) at session.c:687

I cannot see any timeout in the select(). The problem is we cannot control how much time it may take. We
have used setsockopt to set a read/write timeout on the socket we pass to libssh2, but the socket timeouts
have no effect with select.

So would it be possible to add an extra argument to libssh2_session_startup() so that we can have
a timeout and we can just return and error instead of hanging. We are using libssh2-1.2.


Change History

comment:1 Changed 8 years ago by bagder

This isn't a bug actually. It's simply so that the blocking interface for libssh2 has no timeout anywhere, so all blocking functions do risk to "hang" for a very long time if there's no network activity.

As the comment in _libssh2_wait_socket() says:

/* Note that this COULD be made to use a timeout that perhaps could be

customizable by the app or something... */

but I think that rather than adding a timeout to a single function, it should rather be something more generic so that all blocking functions would get benefit from such a timeout.

comment:2 Changed 8 years ago by anonymous

Thanks for your quick reply. That's right, a global timeout option would be ideal. But I think it may be more important in that function, because in other places the libssh2 the user program has a way to set a timeout:
1) the program does the connect() so it can do the setsockopt() to put a timeout
2) for the standard libssh2_channel_read() and libssh2_channel_write() functions we can use the non blocking mode using libssh2_session_set_blocking()

The socket timeout is effective whenever libssh2 is doing a simple read/write on the socket. It's really a problem there because libssh2 is using a select() which makes the socket timeout option ineffective.


comment:3 Changed 8 years ago by bagder

If non-blocking is an option, why can't you just use it for the libssh2_session_startup() call too?

The select() done by libssh2 is there _only_ to provide the blocking API. If you use it non-blocking the app does the select()ing...

comment:4 Changed 8 years ago by anonymous

It's a good idea: I changed to the non-blocking mode and had to use loops to retry while it returns EAGAIN, and it seems to work. It's just a bit more complicated.

Anyway, the global timeout option will be great when it's implemented. Thanks for your work on libssh2.

comment:5 Changed 8 years ago by bagder

closing this, as it isn't actually a bug but more a missing feature...

Note: See TracTickets for help on using tickets.