freeipmi-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-devel] lib: raw command and threads


From: Albert Chu
Subject: Re: [Freeipmi-devel] lib: raw command and threads
Date: Mon, 02 Dec 2013 14:10:37 -0800

Hi Thomas,

Sounds like it could be this issue:

http://www.gnu.org/software/freeipmi/freeipmi-faq.html#Why-am-I-seeing-so-many-_0027internal-IPMI-error_0027-messages_003f

While I haven't seen segfaults yet, it's very believable it could
happen.

Al

On Mon, 2013-12-02 at 18:16 +0100, Thomas Cadeau wrote:
> Hi all,
> 
> I come back with the same result on "clean" nodes.
> 
> Using a simple program, I have answer in 3ms with active driver:
> $ lsmod |grep ipmi
>     ipmi_devintf            8145  2
>     ipmi_si                42497  1
> And an answer in 17 ms without the driver.
> 
> Inside our project, we have no problem with the drivers, but without, we 
> always have the memory issue described beside.
> Note it will be a real problem with new rhel kernel.
> We really need  to correct this.
> 
> I will come back tomorrow to see witch part of our code we can share for 
> the moment.
> The project is quite big but the only difference with the simple program 
> I see is the call inside the thread.
> 
> Thomas
> 
> Le 22/11/2013 16:51, Thomas Cadeau a écrit :
> > Thanks a lot for your answer.
> >
> > The way you propose will not fit to what we want to do.
> > I re-ran on "safe" cpus without any troubles.
> >
> > When I will have a real pool of cpus without any other troubles, I 
> > will let you you if there is again the problem.
> >
> > Thomas
> >
> > Le 21/11/2013 20:23, Albert Chu a écrit :
> >> Hi Thomas,
> >>
> >> I did a quick sanity test on my system and it worked (of course, it may
> >> have not been exactly like you did things).
> >>
> >> The trace indicates the segfault is here:
> >>
> >>> #0  0x00007f4e278c89a9 in inb (ctx=0x7f4e28001770) at
> >>>> /usr/include/sys/io.h:48
> >> Which is during memory mapped i/o.  I suppose a segfault could happen if
> >> the in/out call was going to a bad part of memory.  It might suggest
> >> some corruption is happening.  Is it possible you're corrupting some
> >> data structure somewhere?  The close/destroy/re-create works b/c it
> >> fixes the corruption?
> >>
> >> In all of FreeIPMI (especially the multi-ranged host access in the
> >> tools), we create a context per thread for communication, e.g.
> >>
> >> launch_thread
> >>     ctx = ipmi_ctx_create();
> >>     ipmi_ctx_find_inband(ctx, ...);
> >>     loop
> >>        ipmi_cmd_raw
> >>
> >> Have you considered doing it this way?
> >>
> >> Al
> >>
> >>
> >> On Thu, 2013-11-21 at 17:00 +0100, Thomas Cadeau wrote:
> >>> Hi all,
> >>>
> >>>
> >>> I'am curently tring to call a raw command several times.
> >>> Here are the functions I call:
> >>>
> >>>> ctx = ipmi_ctx_create()
> >>>>
> >>>> ipmi_ctx_find_inband (ctx,
> >>>>                    NULL,//&driver_type,
> >>>>                    0,   // disable_auto_probe,
> >>>>                    0,   // driver_address,
> >>>>                    0,   // register_spacing,
> >>>>                    0,   // driver_device,
> >>>>                    0,   // workaround_flags,
> >>>>                    IPMI_FLAGS_DEFAULT//0
> >>>>                    )
> >>>>
> >>>> ipmi_cmd_raw(ctx,
> >>>>               0x00, //lun (logical unit number)
> >>>>               0x3A,//IPMI_NET_FN_SENSOR_EVENT_RQ,
> >>>>               bytes_rq, //request data //const void *
> >>>>               2, //length (in bytes)
> >>>>               bytes_rs, //response buffer //void *
> >>>>               IPMI_RAW_MAX_ARGS //max response length
> >>>>               )
> >>> I check all return code.
> >>>
> >>> If I create a simple example with a loop, I have no problem.
> >>>> ctx = ipmi_ctx_create()
> >>>> ipmi_ctx_find_inband ( ...  )
> >>>> for (...){
> >>>> ipmi_cmd_raw(...)
> >>>> //use result
> >>>> }
> >>> Then I try inside an internal project, during initialization, I use the
> >>> 3 functions, and then each time I want to update and call
> >>> ipmi_cmd_raw(...), a thread is created to do all operations.
> >>>
> >>>> ctx = ipmi_ctx_create()
> >>>> ipmi_ctx_find_inband ( ...  )
> >>>>   ipmi_cmd_raw(...)
> >>>>   //use result
> >>>> ...
> >>>> //with fixed frequency:
> >>>> launch thread
> >>>>         > ipmi_cmd_raw(...)
> >>>>         > //use result
> >>> In this case, on some cpus, I have no problem. But on some, I have a
> >>> segfault (core dump):
> >>>> #0  0x00007f4e278c89a9 in inb (ctx=0x7f4e28001770) at
> >>>> /usr/include/sys/io.h:48
> >>>> #1  _ipmi_kcs_get_status (ctx=0x7f4e28001770) at
> >>>> driver/ipmi-kcs-driver.c:533
> >>>> #2  0x00007f4e278c8e50 in _ipmi_kcs_wait_for_ibf_clear
> >>>> (ctx=0x7f4e28001770)
> >>>>      at driver/ipmi-kcs-driver.c:656
> >>>> #3  0x00007f4e278c91d6 in ipmi_kcs_write (ctx=0x7f4e28001770,
> >>>> buf=0x7f4e28003420, buf_len=3)
> >>>>      at driver/ipmi-kcs-driver.c:845
> >>>> #4  0x00007f4e27898bc1 in _kcs_cmd_write (ctx=0x7f4e28005190,
> >>>> obj_cmd_rq=<value optimized out>,
> >>>>      obj_cmd_rs=0x7f4e28001ae0) at api/ipmi-kcs-driver-api.c:255
> >>>> #5  api_kcs_cmd (ctx=0x7f4e28005190, obj_cmd_rq=<value optimized out>,
> >>>> obj_cmd_rs=0x7f4e28001ae0)
> >>>>      at api/ipmi-kcs-driver-api.c:398
> >>>> #6  0x00007f4e27899091 in api_kcs_cmd_raw (ctx=0x7f4e28005190,
> >>>> buf_rq=0x7f4e2e390a60, buf_rq_len=2,
> >>>>      buf_rs=0x7f4e2e38f8c0, buf_rs_len=4512) at
> >>>> api/ipmi-kcs-driver-api.c:750
> >>>> #7  0x00007f4e2788f9a9 in ipmi_cmd_raw (ctx=0x7f4e28005190, lun=<value
> >>>> optimized out>,
> >>>>      net_fn=<value optimized out>, buf_rq=0x7f4e2e390a60, 
> >>>> buf_rq_len=2,
> >>>> buf_rs=0x7f4e2e38f8c0,
> >>>>      buf_rs_len=4512) at api/ipmi-api.c:1983
> >>> If I force to connect again, I have no problem. But this workaround is
> >>> not a good way:
> >>>> ctx = ipmi_ctx_create()
> >>>> ipmi_ctx_find_inband ( ...  )
> >>>>   ipmi_cmd_raw(...)
> >>>>   //use result
> >>>> ...
> >>>> //with fixed frequency:
> >>>> launch thread
> >>>>         > ipmi_ctx_close(ctx)
> >>>>         > ipmi_ctx_destroy(ctx);
> >>>>> ctx = ipmi_ctx_create()
> >>>>> ipmi_ctx_find_inband ( ...  )
> >>>>         >ipmi_cmd_raw(...)
> >>>>         > //use result
> >>> Note that I check the version of BMC on each nodes, and I use
> >>> freeipmi-1.2.1.
> >>> I also hace security to ensure only one use of ctx can be done.
> >>>
> >>> Do you have any idea of what happpens and if I'm doing something wrong?
> >>> Is there a function to check the connection is opened and if I need to
> >>> reopen?
> >>>
> >>> Thank you for your help.
> >>>
> >>> Thomas Cadeau
> >>>
> >>> _______________________________________________
> >>> Freeipmi-devel mailing list
> >>> address@hidden
> >>> https://lists.gnu.org/mailman/listinfo/freeipmi-devel
> >
> 
-- 
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory





reply via email to

[Prev in Thread] Current Thread [Next in Thread]