Hi James, Re: https://support.opensciencegrid.org/support/tickets/public/f43c02bc6b243d50… Unfortunately when I look I see plenty of what looks like glideins submitting on the 4 core entry, but none are registering back, and we are back at the situation before where I don't get glidein logs back anymore to the factory to be able to debug. Slight correction from what you said above, our hosted CE submits to ldas-osg-ce.ligo-wa.caltech.edu, not ldas-osg-dev.ligo-wa.caltech.edu, and it is the 4 core glideins we are worried about from the osgpilot user. So if the admins can find the stderr / stdout of any of these running glideins, it would help us get to the bottom of it from (ldas-osg-ce.ligo-wa.caltech.edu): [osgpilot@ldas-osg-ce ~]$ condor_q -const 'owner=?="osgpilot" && RequestCpus==4 && jobstatus==2' -- Schedd: ldas-osg-ce.ligo-wa.caltech.edu : <10.21.201.31:9618?... @ 05/16/24 12:30:52 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS osgpilot ID: 273599 5/14 13:07 _ 1 _ 1 273599.0 osgpilot ID: 278125 5/16 00:09 _ 1 _ 1 278125.0 osgpilot ID: 278130 5/16 00:09 _ 1 _ 1 278130.0 osgpilot ID: 278134 5/16 00:11 _ 1 _ 1 278134.0 osgpilot ID: 278135 5/16 00:11 _ 1 _ 1 278135.0 osgpilot ID: 278186 5/16 00:19 _ 1 _ 1 278186.0 osgpilot ID: 278223 5/16 00:59 _ 1 _ 1 278223.0 Total for query: 7 jobs; 0 completed, 0 removed, 0 idle, 7 running, 0 held, 0 suspended Total for osgpilot: 903 jobs; 0 completed, 0 removed, 896 idle, 7 running, 0 held, 0 suspended Total for all users: 1231 jobs; 0 completed, 0 removed, 1086 idle, 12 running, 133 held, 0 suspended
On
Mon, 13 May at 11:54 AM
, James Clark <james.clark(a)ligo.org> wrote:
[EXTERNAL] – This message is from an external sender Hi Jeff, Well, I've spent all morning digging around with zero hints of what's up. Here's as much as I can find - if none of the following sheds any light, is it worth trying to schedule a call to debug live? 1. IGWN and OSPool glideins are definitely running. Of the currently running glideins, it looks like they entered that state generally about 15+ minutes ago [1]. 2. The only glidein logs I see are from the brief burst of pretty erratic activity around 05/02 - 05/04. Some of those logs complain about the stashcp version but that was an unrelated problem when I broke the frontend; the other logs I see from that period look completely normal to me. 3. During those couple of days of activity, a mix of single- and 4-core jobs exclusively went through the 8-core entry; nothing went through the 4-core entry. 4. (Presumably) tangential to the above problems but it looks like the GPU entry is still trying to use the old self-hosted CE hostname. 5. I asked about access to what I thought was the local schedd you were talking to, ldas-osg-dev.ligo-wa.caltech.edu, and Dan (cc'd) noted: >There is no ldas-osg-dev per se. Its GC address is being used by osg-gw.dcs.ligo-wa.caltech.edu, through which I had been routing traffic associated with the osgpilot and osg01 users (using an ip rule specific to uidrange 9991-9999). At least part of that traffic now goes through fontier.ligo-wa.caltech.edu. I'm not sure if that has any implications but that predates the small burst of activity that we did have, so presumably not a fundamental blocker (?). [1] CE queue queries: Totals: condor_ce_q -name ligo-hanford-ce1.svc.opensciencegrid.org -pool collector.opensciencegrid.org:9619 -tot -- Schedd: ligo-hanford-ce1.svc.opensciencegrid.org : <192.170.231.37:9619?... @ 05/13/24 06:59:56 Total for query: 1406 jobs; 0 completed, 0 removed, 1342 idle, 62 running, 2 held, 0 suspended Total for all users: 1406 jobs; 0 completed, 0 removed, 1342 idle, 62 running, 2 held, 0 suspended Running jobs: $ condor_ce_q -name ligo-hanford-ce1.svc.opensciencegrid.org -pool collector.opensciencegrid.org:9619 -r un -- Schedd: ligo-hanford-ce1.svc.opensciencegrid.org : <192.170.231.37:9619?... @ 05/13/24 07:00:29 ID OWNER SUBMITTED RUN_TIME HOST(S) 75459.0 osgpilot 5/12 19:12 0+00:02:44 75475.0 osgpilot 5/12 19:12 0+00:02:44 batch condor osgpilot(a)ldas-osg-ce.ligo-wa.caltech.edu --rgahp-glite ~/bosco/glite <snip> Entered current state this long ago: $ for time in `condor_ce_q -run -name ligo-hanford-ce1.svc.opensciencegrid.org -pool collector.opensciencegrid.org:9619 -af EnteredCurrentStatus`; do echo "$(($(date +%s)-${time}))"; done 1092 1098 1061 1000 <snip>
On 5/13/24 09:37, James Alexander Clark wrote: > Hi Jeff, > > Just a heads up that I'm taking a deeper dive on this today. > > I'm gathering data now but first thing I notice is the tiny burst of > jobs that made it through around 05/03 seemed to go exclusively through > the 8-core entry; nothing went through the 4-core or GPU entries. > > Not sure if there's any significance to that? > > More info as I get it... > > On 5/9/24 12:38, James Alexander Clark wrote: >> Hi Jeff, >> >> I was messing around with the FE config this week to try to better >> control the stashcp version but (should have..) reverted any changes >> related to production yesterday. >> >> I also haven't noticed validation completely failing as a result of >> any of those changes. >> >> Any thoughts Jason? >> >> On 5/9/24 12:29, Jeffrey Dost wrote: >>> Hi James, >>> >>> *Re:* >>> https://support.opensciencegrid.org/support/tickets/public/f43c02bc6b243d50… <https://support.opensciencegrid.org/support/tickets/public/f43c02bc6b243d50…> >>> >>> I'm not sure how to interpret the FE monitoring but I'm seeing >>> glideins are finally sending logs back to the factory.. looks like >>> all of them are failing validation with the following: >>> >>> Validation failed in condor_startup.sh. >>> >>> Cannot extract STASHCP_VERSION from 'glidein_config' >>> >>> I've never seen error before.. do you or Jason have any ideas? >>> >>> Jeff >>> >>> >>> On Thu, 2 May at 3:57 PM , James Clark <james.clark(a)ligo.org> wrote: >>> [EXTERNAL] – This message is from an external sender >>> >>> Hi Jeff, >>> >>> I've asked internally. I do see a small number of payload >>> histories but >>> will launch some more targeted tests too. >>> >>> In the meantime, anyone know why the Frontend doesn't appear to know >>> about these? Am I looking at the wrong traces? >>> >>> https://vo-frontend-igwn.igwn-prod.chtc.io/vofrontend/monitor/frontendStatu… <https://vo-frontend-igwn.igwn-prod.chtc.io/vofrontend/monitor/frontendStatu…> >>> >>> On 5/2/24 15:22, Jeffrey Dost wrote: >>> > >>> > Did any intervention happen at the site? I see glideins are >>> running >>> > there now. >>> >>> -- >>> James Alexander Clark >>> LIGO Laboratory >>> California Institute of Technology >>> email: james.clark(a)ligo.org >>> Tel. (cell): 413-230-1412 >>> _______________________________________________ >>> Igwn-dhtc-ops mailing list -- igwn-dhtc-ops(a)nikhef.nl >>> To unsubscribe send an email to igwn-dhtc-ops-leave(a)nikhef.nl >>> >>> 76253:196291 >> > -- **Please note:** LIGO Lab has moved to a 9/80 work schedule, so I will be off work every second Friday from 17th May onwards. James Alexander Clark LIGO Laboratory California Institute of Technology email: james.clark(a)ligo.org Tel. (cell): 413-230-1412
There is a new comment in the ticket submitted by James Clark to OSG User Documentation Ticket URL: https://support.opensciencegrid.org/support/tickets/public/4d0b40167271fc70… Comment added by : Paschalis Paschos Comment Content: <div>[EXTERNAL] – This message is from an external sender</div>
<div></div>
<div>
<div dir="ltr">do you have a quota on that <span style="font-family:verdana,arial,sans-serif;font-size:13px">/home/osg.factory/ </span>directory? </div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr"></div>
</div>
</div>
<div class="freshdesk_quote"><blockquote class="freshdesk_quote">
<div>On Thu, Apr 25, 2024 at 1:11 PM Clark, James A. <<a href="mailto:jaclark@caltech.edu" rel="noreferrer">jaclark(a)caltech.edu</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="auto">Hi Pascal,</div>
<div dir="auto">
<br>
</div>
<div dir="auto">Right, I moved the old logs out the way and let it start over. I'm still on my way back home from Madison but it did look like it has repopulated when I looked yesterday. </div>
<div dir="auto">
<br>
</div>
<div dir="auto">I will check in a little later today/tomorrow. <span></span>
</div>
<div>
<br>
</div>
<div id="m_1621215644628360949ms-outlook-mobile-signature" dir="auto">Get <a href="https://urldefense.com/v3/__https://aka.ms/AAb9ysg__;!!BpyFHLRN4TMTrA!5cCZm…" target="_blank" rel="noreferrer">
Outlook for Android</a>
</div>
<hr style="display:inline-block;width:98%">
<div id="m_1621215644628360949divRplyFwdMsg" dir="ltr">
<font style="font-size:11pt" color="#000000"><b>From:</b> Paschalis Paschos <<a href="mailto:support@opensciencegrid.org" target="_blank" rel="noreferrer">support(a)opensciencegrid.org</a>><br>
<b>Sent:</b> Thursday, April 25, 2024 11:26:24 AM<br>
<b>To:</b> <a href="mailto:james.clark@ligo.org" target="_blank" rel="noreferrer">james.clark(a)ligo.org</a> <<a href="mailto:james.clark@ligo.org" target="_blank" rel="noreferrer">james.clark(a)ligo.org</a>><br>
<b>Cc:</b> <a href="mailto:paschos@uchicago.edu" target="_blank" rel="noreferrer">paschos(a)uchicago.edu</a> <<a href="mailto:paschos@uchicago.edu" target="_blank" rel="noreferrer">paschos(a)uchicago.edu</a>>;
<a href="mailto:agraves10@unl.edu" target="_blank" rel="noreferrer">agraves10(a)unl.edu</a> <<a href="mailto:agraves10@unl.edu" target="_blank" rel="noreferrer">agraves10(a)unl.edu</a>>;
<a href="mailto:jdost@ucsd.edu" target="_blank" rel="noreferrer">jdost(a)ucsd.edu</a> <<a href="mailto:jdost@ucsd.edu" target="_blank" rel="noreferrer">jdost(a)ucsd.edu</a>>;
<a href="mailto:igwn-dhtc-ops@nikhef.nl" target="_blank" rel="noreferrer">igwn-dhtc-ops(a)nikhef.nl</a> <<a href="mailto:igwn-dhtc-ops@nikhef.nl" target="_blank" rel="noreferrer">igwn-dhtc-ops(a)nikhef.nl</a>>; Clark, James A. <<a href="mailto:jaclark@caltech.edu" target="_blank" rel="noreferrer">jaclark(a)caltech.edu</a>><br>
<b>Subject:</b> Re: [#76259] IGWN pool glidein logs</font>
<div> </div>
</div>
<div>
<div style="font-family:verdana,arial,sans-serif;font-size:13px">
<div style="font-family:verdana,arial,sans-serif;font-size:13px">
<div dir="ltr">
<div dir="ltr">Hi Jeff,</div>
<div>
<br>
</div>
<div>
<strong>Re:</strong> <a href="https://urldefense.com/v3/__https://support.opensciencegrid.org/support/tic…" rel="noreferrer" target="_blank">
https://support.opensciencegrid.org/support/tickets/public/4d0b40167271fc70…</a>
</div>
<div>
<br>
</div>
<div dir="ltr">there is got to be a process (cron) that rsyncs factory pilots logs for LIGO back to CIT. </div>
<div dir="ltr">
<br>
</div>
<div dir="ltr">James, I see logs in /home/osg.factory/gfaclogs/sdsc/igwn from various sites</div>
<div dir="ltr">
<br>
</div>
<div dir="ltr">- P.</div>
<div dir="ltr">
<br>
</div>
<div>
<br>
</div>
<div>
<br>
</div>
</div>
<div dir="ltr">
<div>
<br>
</div>
</div>
<div>
<blockquote>On Wed, 24 Apr at 1:20 PM <span>, Clark, James A. <<a href="mailto:jaclark@caltech.edu" target="_blank" rel="noreferrer">jaclark(a)caltech.edu</a>> wrote:
<div>[EXTERNAL] – This message is from an external sender</div>
<div></div>
<div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi Jeff,</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
At the time, I was having trouble finding recent glidein logs for any entry and couldn't find correctly named directories for the LIGO-WA entries (capital C suffix, rather than lower case).</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
I noticed some more recent logs yesterday, however, so let me dig around a bit more.</div>
<div style="font-family:Aptos,Aptos_EmbeddedFont,Aptos_MSFontService,Calibri,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
<br>
</div>
<div id="m_1621215644628360949x_appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_1621215644628360949x_divRplyFwdMsg" dir="ltr"><font color="#000000" style="font-size:11pt"><b></b></font></div>
</div>
<div>
<blockquote>
<div>From: Jeffrey Dost <<a href="mailto:support@opensciencegrid.org" target="_blank" rel="noreferrer">support(a)opensciencegrid.org</a>><br>
<b>Sent:</b> Tuesday, April 23, 2024 12:14 PM<br>
<b>To:</b> <a href="mailto:james.clark@ligo.org" target="_blank" rel="noreferrer">james.clark(a)ligo.org</a> <<a href="mailto:james.clark@ligo.org" target="_blank" rel="noreferrer">james.clark(a)ligo.org</a>><br>
<b>Cc:</b> <a href="mailto:paschos@uchicago.edu" target="_blank" rel="noreferrer">paschos(a)uchicago.edu</a> <<a href="mailto:paschos@uchicago.edu" target="_blank" rel="noreferrer">paschos(a)uchicago.edu</a>>;
<a href="mailto:agraves10@unl.edu" target="_blank" rel="noreferrer">agraves10(a)unl.edu</a> <<a href="mailto:agraves10@unl.edu" target="_blank" rel="noreferrer">agraves10(a)unl.edu</a>>;
<a href="mailto:jdost@ucsd.edu" target="_blank" rel="noreferrer">jdost(a)ucsd.edu</a> <<a href="mailto:jdost@ucsd.edu" target="_blank" rel="noreferrer">jdost(a)ucsd.edu</a>>;
<a href="mailto:igwn-dhtc-ops@nikhef.nl" target="_blank" rel="noreferrer">igwn-dhtc-ops(a)nikhef.nl</a> <<a href="mailto:igwn-dhtc-ops@nikhef.nl" target="_blank" rel="noreferrer">igwn-dhtc-ops(a)nikhef.nl</a>><br>
<b>Subject:</b> Re: [#76259] IGWN pool glidein logs </div>
<div> </div>
<div>
<div style="font-family:verdana,arial,sans-serif;font-size:13px">
<div style="font-family:verdana,arial,sans-serif;font-size:13px">
<div dir="ltr">
<div>Hi James,</div>
<div>
<br>
</div>
<div>
<strong>Re:</strong> <a href="https://urldefense.com/v3/__https://support.opensciencegrid.org/support/tic…" rel="noreferrer" target="_blank">
https://support.opensciencegrid.org/support/tickets/public/4d0b40167271fc70…</a>
</div>
<div>
<br>
</div>
<div dir="ltr">Can you please confirm which CIT entries you want me to investigate? looks like we have several but I see this one serving a lot of glideins:</div>
<div><a href="https://urldefense.com/v3/__http://gfactory-2.opensciencegrid.org/factory/m…" rel="noreferrer" target="_blank">http://gfactory-2.opensciencegrid.org/factory/monitor/factoryStatus.html?en…</a></div>
<div>
<br>
</div>
<div dir="ltr">Thanks,</div>
<div dir="ltr">Jeff<br>
</div>
<div>
<br>
</div>
</div>
<div dir="ltr">
<div>
<br>
</div>
</div>
<div>
<blockquote>On Tue, 23 Apr at 11:07 AM <span>, Paschalis Paschos <<a href="mailto:support@opensciencegrid.org" target="_blank" rel="noreferrer">support(a)opensciencegrid.org</a>> wrote:
<div style="font-family:verdana,arial,sans-serif;font-size:13px">
<div dir="ltr">
<div>Hi James,</div>
<div>
<br>
</div>
<div>
<strong>Re:</strong> <a href="https://urldefense.com/v3/__https://support.opensciencegrid.org/support/tic…" rel="noreferrer" target="_blank">
https://support.opensciencegrid.org/support/tickets/public/4d0b40167271fc70…</a>
</div>
<div>
<br>
</div>
<div dir="ltr">I am not sure how do I check on that. I don't see any logs on CIT. Is it a pull request from CIT via a cron job or the factory actively sending those to you? <br>
</div>
<div>
<br>
</div>
</div>
<div dir="ltr">
<div>
<br>
</div>
</div>
<div>
<blockquote>On Fri, 19 Apr at 11:00 AM <span>, James Clark <<a href="mailto:james.clark@ligo.org" target="_blank" rel="noreferrer">james.clark(a)ligo.org</a>> wrote:
<div>[EXTERNAL] – This message is from an external sender<br>
<br>
As noted in #76253 I'm having a hard time finding any recent factory<br>
glidein logs in the rsync'd directory at CIT.<br>
<br>
Can we confirm access is working ok? I know there was a mysterious<br>
issues for the ITB logs which seemed to resolve itself.<br>
<br>
<br>
--<br>
James Alexander Clark<br>
LIGO Laboratory<br>
California Institute of Technology<br>
email: <a href="mailto:james.clark@ligo.org" target="_blank" rel="noreferrer">james.clark(a)ligo.org</a><br>
Tel. (cell): 413-230-1412<br>
</div>
</span>
</blockquote>
</div>
</div>
</span>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</span>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
<div>
<br>
</div>
<span class="gmail_signature_prefix">-- </span><br>
<div dir="ltr" class="gmail_signature">
<div dir="ltr">
<font><i>Pascal Paschos, Ph.D.</i></font>
<div>
<i style='font-family:"times new roman",serif'>OSG/PATh Collaboration Support </i><br>
</div>
<div>
<font><i>Enrico Fermi Institute - </i></font><i style='font-family:"times new roman",serif'>University of Chicago</i>
</div>
<div><font><i>ph: 773-702-4679</i></font></div>
</div>
</div>
</blockquote></div>
Hi James, Re: https://support.opensciencegrid.org/support/tickets/public/f43c02bc6b243d50… I'm not sure how to interpret the FE monitoring but I'm seeing glideins are finally sending logs back to the factory.. looks like all of them are failing validation with the following: Validation failed in condor_startup.sh. Cannot extract STASHCP_VERSION from 'glidein_config' I've never seen error before.. do you or Jason have any ideas? Jeff
On
Thu, 2 May at 3:57 PM
, James Clark <james.clark(a)ligo.org> wrote:
[EXTERNAL] – This message is from an external sender Hi Jeff, I've asked internally. I do see a small number of payload histories but will launch some more targeted tests too. In the meantime, anyone know why the Frontend doesn't appear to know about these? Am I looking at the wrong traces? https://vo-frontend-igwn.igwn-prod.chtc.io/vofrontend/monitor/frontendStatu…
On 5/2/24 15:22, Jeffrey Dost wrote: > > Did any intervention happen at the site? I see glideins are running > there now. -- James Alexander Clark LIGO Laboratory California Institute of Technology email: james.clark(a)ligo.org Tel. (cell): 413-230-1412 _______________________________________________ Igwn-dhtc-ops mailing list -- igwn-dhtc-ops(a)nikhef.nl To unsubscribe send an email to igwn-dhtc-ops-leave(a)nikhef.nl
Hi James, Re: https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0… I've been wrestling with how to best tackle this for you. In the past, we've whitelisted direct access to the k8s API server but I'd really like to move away from that. The alternative is to set up a SOCKS proxy through a bastion host and I'd like to set one up specifically for Tiger access. Depending on how quickly you need this access, I see two paths forward: 1. Spin up a Tiger bastion host. I could shuffle around some effort and try to get this done in the next week or two. 2. Allow IGWN operators to proxy through ap42. I think we may have to poke a firewall hole. You'll have to update your SSH configurations whenever we do set up the Tiger bastion, though. - Brian
On
Thu, 9 May at 9:00 AM
, James Clark <james.clark(a)ligo.org> wrote:
[EXTERNAL] – This message is from an external sender Hi Brian, Jason just pointed out that you proxy kubectl and I'll need some level of access to tigermaster0000. Is that something we can make happen? Thanks
On 5/8/24 09:56, James Alexander Clark wrote: > Thanks, Brian. > > Do I need to be logged in anywhere special? > > $ k get pods -n igwn-prod > E0508 09:49:52.557811 151972 memcache.go:265] couldn't get current > server API group list: Get > "https://tigermaster0000.chtc.wisc.edu:6443/api?timeout=32s": dial tcp > 128.104.103.139:6443: i/o timeout > E0508 09:50:22.559320 151972 memcache.go:265] couldn't get current > server API group list: Get > "https://tigermaster0000.chtc.wisc.edu:6443/api?timeout=32s": dial tcp > 128.104.103.139:6443: i/o timeout > E0508 09:50:52.561430 151972 memcache.go:265] couldn't get current > server API group list: Get > "https://tigermaster0000.chtc.wisc.edu:6443/api?timeout=32s": dial tcp > 128.104.103.139:6443: i/o timeout > E0508 09:51:22.563689 151972 memcache.go:265] couldn't get current > server API group list: Get > "https://tigermaster0000.chtc.wisc.edu:6443/api?timeout=32s": dial tcp > 128.104.103.139:6443: i/o timeout > E0508 09:51:52.565449 151972 memcache.go:265] couldn't get current > server API group list: Get > "https://tigermaster0000.chtc.wisc.edu:6443/api?timeout=32s": dial tcp > 128.104.103.139:6443: i/o timeout > Unable to connect to the server: dial tcp 128.104.103.139:6443: i/o timeout > > Same results for -n igwn-dev on my local machine and on > ospool-ap2042.chtc.wisc.edu > > On 5/7/24 18:37, Brian Lin wrote: >> Hi James, >> >> *Re:* >> https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0… <https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0…> >> >> I've added Josh and Duncan to the group as well. >> >> - Brian >> >> >> On Tue, 7 May at 4:27 PM , James Clark <james.clark(a)ligo.org> wrote: >> [EXTERNAL] – This message is from an external sender >> >> Excellent, thanks! >> >> >> On 5/7/24 17:11, Brian Lin wrote: >> > Hi James, >> > >> > *Re:* >> > >> https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0… <https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0…> <https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0… <https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0…>> >> > >> > Do you have a list of names other than yourself that need >> access? >> >> In addition to myself, I suggest starting with Josh Willis and >> Duncan >> Macleod as people most likely to / have any history of making >> FE-related >> changes. >> >> That will hopefully grow but I don't expect very long or quickly >> so I'm >> not worried about any automation there. >> >> I've >> > given you the appropriate permissions and so you should be >> able to fetch >> > your credentials through >> https://login.tiger.chtc.io/login/tiger-dex >> <https://login.tiger.chtc.io/login/tiger-dex> >> > <https://login.tiger.chtc.io/login/tiger-dex >> <https://login.tiger.chtc.io/login/tiger-dex>>. What kind host >> will you >> > be running 'kubectl' from? Depending on the OS, we've got >> different >> > config tips to streamline k8s API access. >> > >> >> For now, pretty much just my laptop (Ubuntu 23.10), >> conceivably with >> something like k9s/lens - mostly i just want to be able to see >> if pods >> are up and maybe look at their logs for now (i'm pretty sure i >> just >> broke our ITB FE but my only handle on that is the 404 i get >> on the >> status page...). >> >> Thanks! >> >> > - Brian >> > >> > On Tue, 7 May at 3:49 PM , Aaron Moate >> <support(a)opensciencegrid.org> >> > wrote: >> > Hi James, >> > >> > *Re:* >> > >> https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0… <https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0…> <https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0… <https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0…>> >> > >> > Received. An appropriate party should be getting back >> to you shortly. >> > >> > Cheers, >> > Aaron Moate >> > PATh Ops Team >> > >> > >> > On Tue, 7 May at 2:13 PM , James Clark >> <james.clark(a)ligo.org> >> > wrote: >> > [EXTERNAL] – This message is from an external sender >> > >> > Can we please provision kubectl access in the >> igwn-prod k8s >> > namespace so >> > that IGWN operators like me can see what the >> frontend is doing? >> > >> > I am/have been making changes here and there to the >> configs in >> > git.ligo.org and that's been working great but it'd >> be even >> > better if I >> > could see what's going on if (when) I eventually >> break something. >> > >> > -- >> > >> > **Please note:** LIGO Lab has moved to a 9/80 work >> schedule, so >> > I will >> > be off work every second Friday from 17th May onwards. >> > >> > >> > James Alexander Clark >> > LIGO Laboratory >> > California Institute of Technology >> > email: james.clark(a)ligo.org >> > Tel. (cell): 413-230-1412 >> > >> > 76388:196291 >> >> -- >> >> **Please note:** LIGO Lab has moved to a 9/80 work schedule, so >> I will >> be off work every second Friday from 17th May onwards. >> >> >> James Alexander Clark >> LIGO Laboratory >> California Institute of Technology >> email: james.clark(a)ligo.org >> Tel. (cell): 413-230-1412 >> _______________________________________________ >> Igwn-dhtc-ops mailing list -- igwn-dhtc-ops(a)nikhef.nl >> To unsubscribe send an email to igwn-dhtc-ops-leave(a)nikhef.nl >> >> 76388:196291 > -- **Please note:** LIGO Lab has moved to a 9/80 work schedule, so I will be off work every second Friday from 17th May onwards. James Alexander Clark LIGO Laboratory California Institute of Technology email: james.clark(a)ligo.org Tel. (cell): 413-230-1412
Hi James, Re: https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0… I've added Josh and Duncan to the group as well. - Brian
On
Tue, 7 May at 4:27 PM
, James Clark <james.clark(a)ligo.org> wrote:
[EXTERNAL] – This message is from an external sender Excellent, thanks!
On 5/7/24 17:11, Brian Lin wrote: > Hi James, > > *Re:* > https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0… <https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0…> > > Do you have a list of names other than yourself that need access? In addition to myself, I suggest starting with Josh Willis and Duncan Macleod as people most likely to / have any history of making FE-related changes. That will hopefully grow but I don't expect very long or quickly so I'm not worried about any automation there. I've > given you the appropriate permissions and so you should be able to fetch > your credentials through https://login.tiger.chtc.io/login/tiger-dex > <https://login.tiger.chtc.io/login/tiger-dex>. What kind host will you > be running 'kubectl' from? Depending on the OS, we've got different > config tips to streamline k8s API access. > For now, pretty much just my laptop (Ubuntu 23.10), conceivably with something like k9s/lens - mostly i just want to be able to see if pods are up and maybe look at their logs for now (i'm pretty sure i just broke our ITB FE but my only handle on that is the 404 i get on the status page...). Thanks! > - Brian > > On Tue, 7 May at 3:49 PM , Aaron Moate <support(a)opensciencegrid.org> > wrote: > Hi James, > > *Re:* > https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0… <https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0…> > > Received. An appropriate party should be getting back to you shortly. > > Cheers, > Aaron Moate > PATh Ops Team > > > On Tue, 7 May at 2:13 PM , James Clark <james.clark(a)ligo.org> > wrote: > [EXTERNAL] – This message is from an external sender > > Can we please provision kubectl access in the igwn-prod k8s > namespace so > that IGWN operators like me can see what the frontend is doing? > > I am/have been making changes here and there to the configs in > git.ligo.org and that's been working great but it'd be even > better if I > could see what's going on if (when) I eventually break something. > > -- > > **Please note:** LIGO Lab has moved to a 9/80 work schedule, so > I will > be off work every second Friday from 17th May onwards. > > > James Alexander Clark > LIGO Laboratory > California Institute of Technology > email: james.clark(a)ligo.org > Tel. (cell): 413-230-1412 > > 76388:196291 -- **Please note:** LIGO Lab has moved to a 9/80 work schedule, so I will be off work every second Friday from 17th May onwards. James Alexander Clark LIGO Laboratory California Institute of Technology email: james.clark(a)ligo.org Tel. (cell): 413-230-1412 _______________________________________________ Igwn-dhtc-ops mailing list -- igwn-dhtc-ops(a)nikhef.nl To unsubscribe send an email to igwn-dhtc-ops-leave(a)nikhef.nl
Hi James, Re: https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0… Do you have a list of names other than yourself that need access? I've given you the appropriate permissions and so you should be able to fetch your credentials through https://login.tiger.chtc.io/login/tiger-dex. What kind host will you be running 'kubectl' from? Depending on the OS, we've got different config tips to streamline k8s API access. - Brian
On
Tue, 7 May at 3:49 PM
, Aaron Moate <support(a)opensciencegrid.org> wrote:
Hi James,
Re: https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0…
Received. An appropriate party should be getting back to you shortly.
Cheers,
Aaron Moate
PATh Ops Team
On
Tue, 7 May at 2:13 PM
, James Clark <james.clark(a)ligo.org> wrote:
[EXTERNAL] – This message is from an external sender Can we please provision kubectl access in the igwn-prod k8s namespace so that IGWN operators like me can see what the frontend is doing? I am/have been making changes here and there to the configs in git.ligo.org and that's been working great but it'd be even better if I could see what's going on if (when) I eventually break something. -- **Please note:** LIGO Lab has moved to a 9/80 work schedule, so I will be off work every second Friday from 17th May onwards. James Alexander Clark LIGO Laboratory California Institute of Technology email: james.clark(a)ligo.org Tel. (cell): 413-230-1412
Hi James, Re: https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0… Received. An appropriate party should be getting back to you shortly. Cheers, Aaron Moate PATh Ops Team
On
Tue, 7 May at 2:13 PM
, James Clark <james.clark(a)ligo.org> wrote:
[EXTERNAL] – This message is from an external sender Can we please provision kubectl access in the igwn-prod k8s namespace so that IGWN operators like me can see what the frontend is doing? I am/have been making changes here and there to the configs in git.ligo.org and that's been working great but it'd be even better if I could see what's going on if (when) I eventually break something. -- **Please note:** LIGO Lab has moved to a 9/80 work schedule, so I will be off work every second Friday from 17th May onwards. James Alexander Clark LIGO Laboratory California Institute of Technology email: james.clark(a)ligo.org Tel. (cell): 413-230-1412
James Clark submitted a new ticket to OSG User Documentation and requested that we copy you Ticket URL: https://support.opensciencegrid.org/support/tickets/public/aec5c8f8f836acc0… Ticket Description: [EXTERNAL] – This message is from an external sender
Can we please provision kubectl access in the igwn-prod k8s namespace so
that IGWN operators like me can see what the frontend is doing?
I am/have been making changes here and there to the configs in
git.ligo.org and that's been working great but it'd be even better if I
could see what's going on if (when) I eventually break something.
--
**Please note:** LIGO Lab has moved to a 9/80 work schedule, so I will
be off work every second Friday from 17th May onwards.
James Alexander Clark
LIGO Laboratory
California Institute of Technology
email: james.clark(a)ligo.org
Tel. (cell): 413-230-1412
Hi James, Re: https://support.opensciencegrid.org/support/tickets/public/e9a978006b84410b… Sounds like this is resolved. Jason
On
Wed, 1 May at 4:25 PM
, Jason Patton <support(a)opensciencegrid.org> wrote:
Hi James,
Re: https://support.opensciencegrid.org/support/tickets/public/e9a978006b84410b…
So the frontend pod seems fine. Alluding to the previous issue you were seeing in the logs previously in https://git.ligo.org/computing/distributed/igwn-fe-configs/-/merge_requests…, the latest condor tarball on the ITB factory is condor-23.5.2, so it's not finding 23.6.2. I'll put in a request to get 23.6.2 uploaded to the ITB factory.
Jason
On
Wed, 1 May at 12:00 PM
, Jason Patton <support(a)opensciencegrid.org> wrote:
Hi James,
Re: https://support.opensciencegrid.org/support/tickets/public/e9a978006b84410b…
I will take a look later today, could just be the same problem we had with the production frontend where the PVC ran out of space due to having staged 100-something copies of the Pelican client.
Jason
On
Wed, 1 May at 11:05 AM
, James Clark <james.clark(a)ligo.org> wrote:
[EXTERNAL] – This message is from an external sender The IGWN ITB FE does not seem to be responding to job pressure. See e.g.: https://vo-frontend-igwn.igwn-dev.chtc.io/vofrontend/monitor/frontendStatus… Where I submitted a bunch of jobs last week with MY.is_itb=True. I just submitted another similar batch but there's not much happening. -- James Alexander Clark LIGO Laboratory California Institute of Technology email: james.clark(a)ligo.org Tel. (cell): 413-230-1412