ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have been developing a program that queries remote servers for a file and receiving the a file. I have it setups to run as a daemon. The program runs for 10 to 14 hours before it gets a segfault. I am totally confused as it appears to be happening in the fprintf function. The particular call gets executed multiple times every 10 minutes( it is in a logging routine ). Can anybody give me a clue as to what is going on?
Code:
System details:
Ubuntu 20.04 Server LTS
gcc version: 9.3.0
libc version: 2.31
gdb output:
Code:
Core was generated by `./sysmond_vfprintf_debug receive'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __vfprintf_internal (s=0x0, format=0x56458e7315b8 "%02d/%02d/%04d %02d:%02d:%02d: ", ap=ap@entry=0x7fff3b5b37d0,
mode_flags=mode_flags@entry=0) at vfprintf-internal.c:1328
1328 vfprintf-internal.c: No such file or directory.
(gdb) bt
#0 __vfprintf_internal (s=0x0, format=0x56458e7315b8 "%02d/%02d/%04d %02d:%02d:%02d: ", ap=ap@entry=0x7fff3b5b37d0,
mode_flags=mode_flags@entry=0) at vfprintf-internal.c:1328
#1 0x00007f2f47626c9a in __fprintf (stream=<optimized out>, format=<optimized out>) at fprintf.c:32
#2 0x000056458e7302a5 in log_message (log_file=0x56458ffee4d0 "/var/log/sysmon_xfer.log",
format=0x56458e731516 "Total bytes received: %ld") at sysmond_vfprintf_debug.c:594
#3 0x000056458e72fb1e in receiver () at sysmond_vfprintf_debug.c:431
#4 0x000056458e72ea12 in main (argc=2, argv=0x7fff3b5b4fc8) at sysmond_vfprintf_debug.c:146
(gdb) frame 2
#2 0x000056458e7302a5 in log_message (log_file=0x56458ffee4d0 "/var/log/sysmon_xfer.log",
format=0x56458e731516 "Total bytes received: %ld") at sysmond_vfprintf_debug.c:594
594 fprintf ( fd, "%02d/%02d/%04d %02d:%02d:%02d: ", timeinfo->tm_mon+1, timeinfo->tm_mday, timeinfo->tm_year+1900,
(gdb) list
589
590 fd = fopen ( log_file, "a" );
591
592 time ( &timer );
593 timeinfo = localtime ( &timer );
594 fprintf ( fd, "%02d/%02d/%04d %02d:%02d:%02d: ", timeinfo->tm_mon+1, timeinfo->tm_mday, timeinfo->tm_year+1900,
595 timeinfo->tm_hour, timeinfo->tm_min, timeinfo->tm_sec );
596
597 va_start ( args, format );
598 vfprintf ( fd, format, args );
(gdb) print timeinfo->tm_mon
$1 = 2
(gdb) print timeinfo->tm_mday
$2 = 9
(gdb) print timeinfo->tm_year
$3 = 122
(gdb) print timeinfo->tm_hour
$4 = 4
(gdb) print timeinfo->tm_min
$5 = 32
(gdb) print timeinfo->tm_sec
$6 = 1
(gdb)
I don't even need to look at this. Locate the statement and examine carefully what the format-string tells the function to expect. Then, find out which one of the subsequent parameters doesn't match this. If possible, set a breakpoint right before the function is called and examine each one of the parameter values. One of these is being used as "a pointer to something" (no doubt a 'string'), and the value is bogus.
Just backing up the last guy. Your gdb output doesn't print fd.
But while in GDB, you can examine that.
Psst, see my signature for info about core dump analysis using GDB. But be aware, as others have indicated, it's a diagnostic tool, not something which serves up an answer.
Examine variables, likely one of them is invalid when the code was expecting otherwise.
Well, then it's exactly what was speculated. GDB shows that fd is null. That means fopen returned null, and you needed to check whether it returned null.
You're overwriting fd with a new stream from the fopen( blah, "a") without closing the original stream from the fopen( blah, "r"). Eventually, you're going to run out of file-descriptors.
Doh, what a stupid mistake. I was really looking in the wrong place. Good catch. Thank you!
I should add the section with the initial fopen, chmod and chown was something I just added because of the security constraints on the new systems I am working on.
Should have looked at that first thing.
Many thanks to GazL. I guess I just needed another set of eyes on the problem. The problem is solved and it was because I was being lazy and just cut and pasting code. I know better than that. Thanks again for everyone's help.
You're welcome, but I just put the last bits of the puzzle together. Neve' pointed the way initially.
The distance that comes with a fresh set of eyes is often a huge advantage. It's so easy to tunnel-vision yourself into thinking one specific way when you've been looking at a problem for any length of time. I suspect we've all been there. I certainly have.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.