Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hts_log_error() extension for better error reporting #1397

Open
lczech opened this issue Feb 14, 2022 · 5 comments
Open

hts_log_error() extension for better error reporting #1397

lczech opened this issue Feb 14, 2022 · 5 comments
Assignees

Comments

@lczech
Copy link

lczech commented Feb 14, 2022

Hi there!

This is more of a general suggestion to improve error reporting, but I'll illustrate it with a concrete example.

I have started using the sam/bam/cram functionality recently, and got an (expected) error "The input is not sorted" on an unsorted bam file during testing. This error is reported in bam_plp_push() here, but not propagated from there to the original caller of the function - the error status gets lost.

My code calls bam_plp_auto(), and checks the result for null to detect whether the current position is usable (I gathered that this is the way this is to be used). That function returns a bam_plp_s* (typedef'ed as bam_plp_t), which however is only declared in the sam.h header, and then defined in sam.c here. Hence, the bam_plp_t field error, which is correctly set by bam_plp_push(), is not accessible from my code to check whether an error occurred.

Furthermore, hts_log() here prints the error directly to stderr, and hence further complicates for me to catch this and report any errors (in my case, via throwing an exception in C++).

It seems hence impossible for me to detect in my code whether a file has simply ended or has an error during iteration.

This all is of course only an example, and seems applicable to many other functions as well. Am I overlooking a simple solution here? What would be a good way of solving this?

One idea that might improve the situation is to offer to set a callback function that gets called in hts_log() with the htsLogLevel and the final error message. Users could then set this callback, and react in their own code to the error message, for example by throwing an exception (or whatever C users do in that situation :-D).

Thank you in advance and all the best
Lucas

@whitwham
Copy link
Contributor

Let's start with your concrete example.

The two occasions where bam_plp_auto is used in samtools the error detection is done by looking at the last argument. If it is < 0 then an error has occurred. This is not documented anywhere. As my colleague said in answer to your other question (#1398); our documentation is poor in many places and very poor for this set of functions.

For the iterator we could add a couple of accessor functions for the eof of and error values of bam_plp_t. They might be helpful.

I'm not sure what the answer to the general case of htslib error handling is. I will have to think more about it.

@whitwham
Copy link
Contributor

whitwham commented Mar 1, 2022

Unless someone comes up with a concrete proposal. I don't think we are going to change the general logging anytime soon.

For the functions mentioned, we do need more documentation.

@lczech
Copy link
Author

lczech commented Mar 23, 2022

Concrete proposal, with no changes to the established API - only adding new functionality:

// ============================================================
//     hts_log.h
// ============================================================

// ---------------------------------------------
//     New:
// ---------------------------------------------

/// If set (default), log messages are printed to stderr.
extern int hts_log_to_stderr;

/// Callback function for additional logging by users.
extern void (*hts_log_callback)( enum htsLogLevel severity, char const* context, char const* message );

/// Set whether to log to stderr.
HTSLIB_EXPORT
void hts_set_log_to_stderr(int value);

/// Set the callback function for additional logging by users.
HTSLIB_EXPORT
void hts_set_log_callback(
    void (*callback)( enum htsLogLevel severity, char const* context, char const* message )
);

// Need to export this now as well, so that users can use it in the callback function.
/// Get a char representation of a log severity level:
/// HTS_LOG_ERROR:   'E'
/// HTS_LOG_WARNING: 'W'
/// HTS_LOG_INFO:    'I'
/// HTS_LOG_DEBUG:   'D'
/// HTS_LOG_TRACE:   'T'
static char get_severity_tag(enum htsLogLevel severity);

// ---------------------------------------------
//     Amending documentation:
// ---------------------------------------------

// Same signature for the log function as before.
// Maybe amend the documentation by:
/// By default, log messages are written to stderr. This can be deactivated by calling
/// hts_set_log_to_stderr( 0 ).
/// Furthermore, a log callback function can be set with hts_set_log_callback(), so that user code
/// is called, with the log severity level, context as determined by the type of log message
/// (see get_severity_tag for details), and the message. This allows users to react to log messages,
/// for example by terminating the program when an error occurs.
HTSLIB_EXPORT
void hts_log(enum htsLogLevel severity, const char *context, const char *format, ...)
HTS_FORMAT(HTS_PRINTF_FMT, 3, 4);
// ============================================================
//     hts.c
// ============================================================

// ---------------------------------------------
//     New:
// ---------------------------------------------

/// Initialize so that by default, logs are printed to stderr.
HTSLIB_EXPORT
int hts_log_to_stderr = 1;

/// Initialize the log callback function to not do anything.
void (*hts_log_callback)( enum htsLogLevel, char const*, char const* ) = nullptr;

/// Set whether to log to stderr.
void hts_set_log_to_stderr(int value)
{
    hts_log_to_stderr = value;
}

/// Set the callback function for additional logging by users.
void hts_set_log_callback(
    void (*callback)( enum htsLogLevel severity, char const* context, char const* message )
)
{
    hts_log_callback = callback;
}

// ---------------------------------------------
//     Changed function:
// ---------------------------------------------

/// Log messages, either to stderr (if hts_log_to_stderr is set), and/or by calling the log
/// callback function hts_log_callback, which can be set by hts_set_log_callback().
void hts_log(enum htsLogLevel severity, const char *context, const char *format, ...)
{
    int save_errno = errno;
    if (severity <= hts_verbose) {

        // Print to stderr
        if( hts_log_to_stderr ) {
            fprintf(stderr, "[%c::%s] ", get_severity_tag(severity), context);
            va_list argptr;
            va_start(argptr, format);
            vfprintf(stderr, format, argptr);
            va_end(argptr);
            fprintf(stderr, "\n");
        }

        // If a callback is given, also call it to allow users to do their own logging
        if( hts_log_callback ) {
            // Buffer size as needed - not sure what the longest possible error messages are,
            // but if there are file names etc involved, one might want a large buffer.
            int const buffer_size = 1024 * 1024;
            char buffer[buffer_size];

            // Need to repeat some of the above va_... processing here, 
            /// as re-using `va_list` leads to a segfault (even if it were in the same scope).
            va_list argptr;
            va_start(argptr, format);
            vsnprintf( buffer, buffer_size, format, argptr );
            hts_log_callback( severity, context, buffer );
            va_end(argptr);
        }
    }
    errno = save_errno;
}
// ============================================================
//     main.cpp
// ============================================================

// ---------------------------------------------
//     Example:
// ---------------------------------------------

void my_log_function( enum htsLogLevel severity, char const* context, char const* message )
{
    std::cout << "[" << get_severity_tag( severity ) << "] In function " << context << ": ";
    std::cout << message << "\n";
    if( severity == HTS_LOG_ERROR ) {
        throw std::runtime_error( "Error occurred. Exiting." );
    }
}

int main()
{
    // Deactivate logging to stderr and instead set the log callback function.
    // For C++, this can also be a lambda function.
    hts_set_log_to_stderr(0);
    hts_set_log_callback( &my_log_function );

    hts_log_error( "%s", "some test error" );
}

Feel free to edit as you see fit. This would greatly help to customize error handling of htslib functions!

PS: Cannot reopen this issue myself, but please do so if this is interesting to you!

@jmarshall
Copy link
Member

I too would like finer-grained ways of telling HTSlib to shut up with its logging, so if it were small and self-contained IMHO a logging callback facility might have legs. It wouldn't help much with the particular bam_plp_auto() case, but could be useful in general when applications would prefer to control noise on stderr.

There's no need to facilitate enabling both the default logging and a callback — if the callback wants both, it can just do a similar fprintf(stderr, …) itself in addition to whatever else it wants to do. PR #1414 simplifies this idea down to its essentials and avoids the need for a buffer in hts_log().

@daviesrob daviesrob reopened this Mar 23, 2022
@daviesrob
Copy link
Member

This looks like a reasonable solution for the single-thread case. The tricky part will be what to do when hts_log() is called from a thread, which I'm pretty sure could happen. We could just punt the problem and document that the callback can be called in a thread. Otherwise we may need to come up with a solution that allows messages that originate in threads to be forwarded to the thread that is handling any messages. Hmm, I guess it might be possible to do this by also providing a callback that writes into a ring buffer...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants