Discussion: Bug in SportIdent system

Not sure how official it is but Swedes has been taking it seriously and it got more public attention after Tove getting re instated couple of weeks ago.
http://orienterare.nu/t/2453/p/2
see posts 87 .. 93

Peter L who seems to be pioneer here ahs been investigating forest units for couple of years to check what is going on. For example case Hollie Orr been studies and it is a bit strange case too.
http://orienterare.nu/t/2462
posts 13 -

Oct 5, 2017 8:18 PM #

To make it clear, this is not about punching too fast or anything athlete can do. It is about mark getting written over at next control or something like that. That's why mark can't be found from card but it still is in the control unit runner punched. And the only way to sort it out is to investigate units in forest. This is not against Swedish national rules, I think they changed them recently to allow this. Peter L states he has reinstated 10+ runners since 2012 he started to investigate forest units, 3 cases this year so far. So it definitely happens frequently.

Good to see this is getting investigated.

Oct 5, 2017 8:41 PM #

mikeminium:

Does anyone know if this has happened multiple times to the same person (same SI card). (which would suggest a flaw in particular cards)

Oct 5, 2017 10:08 PM #

Once it has happened to you, you don't doubt it.

Oct 5, 2017 10:13 PM #

bubo:

@mikeminium
I don´t know if the same person has experienced this on more than one occasion, but I can confirm that a friend of mine had four consecutive controls at a recent event without it being written to his SI card. DSQ´d at the event but later reinstated.

Oct 5, 2017 10:19 PM #

bubo:

Re: within the rules or not
As Jagge mentioned above there are additional comments in the organizers "handbook" (not in the actual rules though) that on suspicion of "technical failure" they are allowed to reinstate runners if the punch can be read out from the control unit.

Oct 6, 2017 12:00 AM #

RLShadow:

"Checking forest units is a very much against the IOF rules at high level events"

Why? That makes no sense at all, especially in light of this bug where someone can punch properly, and not have it recorded on their SI card.

Oct 6, 2017 12:22 AM #

EricW:

Many of us have been waiting for the answer to this, for years, but apparently it is an IOF secret.

Oct 6, 2017 1:09 AM #

GuyO:

Could this happen with an SI-11 that flashed (indicating a successful punch) at the missing control?

Oct 6, 2017 2:07 AM #

If the issue is writing over the "punch" record at the following control, then yes.

Oct 6, 2017 2:58 AM #

rm:

In the past, the manufacturer responded to concerns by providing a detailed description of what happens during a punch. From what I recall, that explanation does not address the concern of punches bring overwritten by later punches. It would be interesting to hear the manufacturer's response to this concern (what aspects of the design or testing suite might address this, what they've done to reproduce or investigate, etc.).

Oct 6, 2017 3:23 AM #

rm:

As an aside, if I we're investigating these reported problems, I would want to check and rule out the possibility that the punches are in fact recorded on the card, but not showing due to an error in the firmware of the download unit, or an error in the results software. I don't know whether this has perhaps been investigated? Most systems tend to have numerous potential points of failure. (To be fair, pin punches had modes of failure, like phantom punches that would be made by mud coming through pin holes in the plastic map case, caused by previous punches and subsequent slight movement of the map case, back when American maps for national events often had the punch card within a sealed map unit. Nevertheless, that was investigated and discovered as a failure mechanism (and pin patterns of mud wiped off). Investigation of the current punch systems seems worthwhile given their use for myriad events from small to world championship, so I'm glad that some people seem to be doing so. I hope that the manufacturer can with in with what they've been doing..)

Oct 6, 2017 4:03 AM #

mikeminium:

Several years ago, one of our club members had an SI 5 fail on 3 consecutive punches. In a subsequent test, the same 3 (like 7,8 and 9) failed again. The card was sent back to the manufacturer and replaced. Something was said about the memory being in groups of 3, and one segment failed. But this sounds very different from the problems today.

Is there any documentation of what types of cards, what kinds of control units, what level of firmware, battery level, and if anyone else tried to punch (or held their punch very close) at the same time, or any possible broadcast EM interference like a cell phone or gps watch? Seems like there are a whole lot of variables that need to be eliminated by thoroughly documenting each known failure.

Oct 6, 2017 5:48 AM #

who is confirming these cases? Sportident? Swedish Orienteering?
http://www.svenskorientering.se/Nyheter/forbundsny...

If I understand that correctly Swedish Orienteering expects firmware gets now re-written from the scratch.

Oct 6, 2017 7:24 AM #

pi:

Here is a translation of two of the linked posts in Swedish:

Peter Löfås - 2017-09-24
---------------------------------------
What we have seen the last two years is something else, and there are cases with all stick types (SI 5, 6, 8, 9, 10, 11) and for several of the symptoms there is no way to see during download that something went wrong in the forest. (In some cases it is possible to tell already at download but it will depend on when during the race the problem happened and so on).

I have seen the following cases:

1. Unit leaves a hole on the stick, i.e. writes data at a position further ahead than it should => data can be recreated by reading raw data and download more punches than the stick thinks it has stored => event software like OLA, MeOS etc. will claim that controls towards the end of the course are missing.

2. Unit overwrites earlier punches in the stick (typical scenario: at control 5 the unit will write data for control 1 in the stick and moves the pointer for next control to 2, the next punch at a unit will then overwrite 2 without reporting an error and so on) => event software will claim that one or more controls during the course is missing (what happened to Tove today).

3. Finish punch is not written correctly => event software will claim that finish punch is missing despite that it was done correctly. This case is tricky because if the runner goes back to finish punch again you will incorrectly get a worse time than you should have.

Root cause to these bugs are still unknown, what I have logged so far is that it affects all types of sticks, all type of base units and both newly produced units and older units.

SportIdent receives continuously error reports/raw data files from myself and others but have so far no been able to find the problem.

Swedish Orienteering Federation - 2017 - 10 - 05
---------------------------------------------------------------------------

What is SportIdent doing?

SOFT has a dialogue with SportIdent who releases the following statement:

- That they have received a number of cases where error has been confirmed.
- That considering the errors are very infrequent it is difficult, close to impossible, to reproduce the error.
- That they will rewrite the firmware in the units from scratch. This work is estimated to be completed around New Year.

Is the SportIdent system still approved to be used at competitions?

Yes, it is. SportIdents punching and timekeeping system still meets the requirements that SOFT means should be in place to approved to use at competitions in Sweden. This presumes that SportIdent acknowledges and updates the known issues that have been identified in the system and it is SOFT's opinion that SportIdent is doing this.

SOFT continuously evaluates approved hardware and software for events and engages in a continuous dialogue with the developers of these systems.

Oct 6, 2017 7:30 AM #

Thanks pi.

Oct 6, 2017 9:05 AM #

I haven't attempted to properly read and understand this thread. However I'm curious if anyone has seen what I describe below before? It sounds like it might be one of the scenarios described above.

At a most excellent event in 2014 my SI card registered almost nothing correctly.

If you search for 'Baxter' in the pdf below you can see what the event software says the card recorded.
http://explorerevents.co.uk/wp-content/uploads/201...

It is the same SI card I am using today. It's been used at many events in many countries before and since.

This is the only time this has happened.

I checked the events immediately prior to this one to see if I could recognise anything in the garbled data but couldn't.

As it was a mountain race with a high proportion of participants less familiar with SI the organiser held the clear box and would not let you proceed through the start lane until you had cleared. He had his arm across my chest.

The time is recorded accurately, just not the punches.

It was a three hour event with around 15 controls. Needless to say fast punching was not top of my priorities. I'm quite sure I heard bleeps distinct from the other noises in my head.

110 was the last control I visited before the finish.

Oct 6, 2017 11:00 AM #

To me it seems fairly obvious what is happening here:

The SI sticks contain a slice of persistent storage which can be rewritten, it is controlled with a single record which is a write pointer.

In a Clear unit the pointer is reset to zero, then on each control unit visit the control will read the current pointer value, then write the control number and the current time-of-day to the next slot and then update the write pointer, right?

The lamp flash and sound is supposed to sound after the unit has verified that the entire record has been written correctly and the write pointer updated, but if this process can fail, i.e. you get a partial write, then you could end up with a stick which contains a faulty write pointer.

The funny part here is that before both EMIT and Sport-Ident, the IOF put down the requirements for a timing system, and the need for backup was a given. In the EMIT case this forced the unit to be big enough to carry a paper pin hole marker, while for SI the control units was supposed to be the electronic backup, along with a set of old pin punches to be used when the control was broken.

Within just a few events in Sweden it became clear that it was just too much work to collect all those control units and download the data from them, so they simply changed the rules and took away that part. I.e. checking the punches in the control units became optional instead of something a competitor could ask for.

If we assume the sticks can have marginal (flash?) writeable memory, then it would seem like an obvious idea to have at least two write pointers, located in separate blocks of the chip and require both to be updated before you get the beep. On subsequent controls you would use the larger of the two and write a marker in case of any inconsistent data, but since this would require an update to the firmware of every control it would probably be hard to do?

Oct 6, 2017 3:37 PM #

rm:

If all control units were radio controls that mesh networked, then checking the controls could be less of a nuisance, and real time results would be easier, as well as knowing the last punch of tardy participants. Download of the card would become the redundancy. If some people want the sport more watchable, this would be a significant improvement. This may be the better fix than a rewrite (with new bugs?) or delaying results to collect all units when a dispute arises. It may be grander than some prefer, but surely wouldn't be that much more expensive than what we have. Technology moves on.

Oct 6, 2017 3:48 PM #

andrewd:

This is pretty incredible, good work from Peter for getting it as far as he has! Will be interesting to see how it develops. Glad I'm not a regular behind a computer at events any more!

So all NGB's need to be asking the following:
How should we be treating competitors who are adamant they did visit a control which didn't register?
Interrogating SI boxes just adds a whole new layer of complexity (and time) to an already complex and time consuming job at O events.

In response to this:
""Checking forest units is a very much against the IOF rules at high level events"

Why? That makes no sense at all, especially in light of this bug where someone can punch properly, and not have it recorded on their SI card."

Lots of reasons; time, perception, stubbornness.

Time - At WOC there are flower/medal ceremonies, as well as a set period of time for appeals after a race is over. There may not be enough time to get a forest unit back to fit with this window. Take WOC 2015 long for example, might take an hour or more to get a unit back, we did have 2 mispunches that day but the IOF thew their appeals out. I think for SI Air the punches aren't written to the unit though, so there may be no way to tell (I can't remember the exact details of SI Air, I'm not an expert)

Perception - The IOF don't want to be using a system that has failures. If SI are re-writing firmware then the IOF may be very keen to avoid using it until they can assess the update.

Stubbornness - I have seen first hand that common sense can be ignored by IOF officials for the sake of a rulebook / personal opinion.

Oct 6, 2017 3:55 PM #

"Checking forest units is a very much against the IOF rules at high level events"
Why? That makes no sense at all,

The idea is that a legal punch involves waiting for the beep/flash. The write to unit happens before that. So the existence of a record in the box doesn't mean you punched correctly.
It's almost impossible to punch so fast that there's no record in the box. Runners could start relying on box-checks, which creates a lot of work and delay for the results team.
I don't say its a good idea, but it does make sense.

In my experience, radio controls are the most likely to cause trouble, perhaps due to the additional demands on the battery.

AFAIK SIair leaves no record in the box.

Oct 6, 2017 4:16 PM #

pi:

Peter Löfås' error cases are for correct punch, i.e. this is NOT for cases when the runner removes the stick before the write to the unit completes as these are marked ErrA/ErrB/ErrC in the unit.

So these are legitimate errors where the runner held the stick in the until write completed and beep/flash occurred. The punch was written... just to the wrong spot.

Oct 6, 2017 4:20 PM #

rm:

Relying on cards alone does eliminate the backup though. The beep/flash does indicate that the write to the card occurred successfully, but doesn't eliminate the possibility that something subsequent overwrites or damages the punch, or memory fails, or other failure mechanisms.

Our SAR trackers, which radio a few miles back to base every two minutes, last at least 48 hours in operation (verified last weekend), and are much lighter and a bit smaller (pack of cards) than control units.

Perhaps redundancy in the card is another solution, combining Terje's suggestion with a second location for the punch, perhaps using separate software. Memory keeps getting cheaper.

(Of course, ultimately I think that moving to generic mobile devices may make the most sense. We manage to afford ten thousand dollars for an epunch system, but balk at using mobile devices that most hikers seem to carry into the woods already, given the number of rescue calls directly from the subjects, whom SAR or sheriff's dispatch then directs to findmesar.com to give us their coordinates, so we can just go fetch them without a search. A mobile device system for registration, start and results could be entirely self service, leaving volunteers to course set and give beginner and intermediate instruction and training. I already hate the most common results software; adding in a trip to the field to fetch controls to download sounds even worse. Why do we keep opting for the time consuming, tedious, onerous ways? Orienteering is already crazy high effort, much needlessly.)

Oct 6, 2017 4:35 PM #

andrewd:

@JimBaker because those devices aren't accurate enough to know if you went within ~30cm of the control, which is a requirement of (high level) orienteering. Fine for non-orienteering navigation races, like AR.

Oct 6, 2017 5:43 PM #

mikee:

Statement from SportIdent: https://www.sportident.com/news.html

Oct 6, 2017 6:45 PM #

pi:

Not quite matching what PL is reporting.

Oct 6, 2017 10:39 PM #

rm:

@andrewd:. There are inexpensive devices called beacons which could be used as control "punches" with mobile devices.

Oct 6, 2017 11:12 PM #

"Because the station erroneously treats the punch as successful, it also gives feedback (beep and flash), making the incorrect punch indiscernible from a correct punch to the athlete. "

"The error therefore occurs extremely rarely."

No shit Sherlock.

We've known this for fifteen years. The issue is not with the technology - it is with the rules and their interpretation, or lack of.

If I understand him correctly I agree with Terje. The removal of effective backup for pragmatic reasons was a mistake. Although I was also persuaded by Jagge's past arguments that you should get no feedback and assume the punch has happened and sort it out afterwards ... but don't quiz me on his logic because I'll struggle

Electronic punching is similar to data privacy. We tacitly accept that anyone suitably motivated can know our secrets because of the convenience it provides us and because the possibility it will harm us as an individual is extremely rare - both because we are not individuals of interest and because there are many individuals. Which is fine, until you become an individual of interest or the unlucky statistic.

We accept the rules relating to electronic punching in a similar way. We accept the "extremely rarely" fallibility of technology because it offers massive benefits - particularly in countries where there isn't a lot of forest and great courses can now be fitted within small areas.

But we don't need our noses rubbed in it. No-one likes that. No-one likes to be humiliated. Computer says no. Being told you didn't punch correctly when you know you did - well ... that's just plain humiliating. That's how wars start.

And when the person who stands to gain the most from your punching demise stands up and says "I was with her, I saw her punch" and still that is not enough the sport simply loses all credibility.

Which from my eyrie it has.

Yet navigating as fast as you can through unknown terrain using only a map and compass is simply the best sport in the whole wide world.

Except Kabbadi.

And, back on topic, there appear to be other "extremely rarely" bugs than the one admitted to by SI so far.

Oct 6, 2017 11:26 PM #

"making the incorrect punch indiscernible from a correct punch to the athlete. "

There we go again. The athlete has punched "incorrectly". If only they were more discerning. Then they might have discerned.

The athlete has not punched incorrectly.

Your technology has failed.

Please rewrite that press statement so that "incorrect" and "correct" are removed.

The athlete punched correctly. Your technology is at fault.

Oct 6, 2017 11:59 PM #

Even is the punch was "too fast", then the design of the system is at fault.You should not be given an "all clear" beep if the technology has not recorded the punch. I suspect SI didn't mean to do it, but the press release is an admission of a fatal design flaw rather than just a software bug.

Oct 7, 2017 12:10 AM #

You're going to get problems in any system, no matter how apparently simple it seems conceptually nor how fail safe or mathematically proven it is.

I think the problem is in the way we've adopted and interpreted the technology. The technology in itself has been massively beneficial ... but, for the greater good, some individuals have ended up on the wrong side of it. Is that acceptable? Not sure. I'm still going to use my SI card tomorrow. But am I happy? No.

Oct 7, 2017 12:20 AM #

rm:

In my long experience, SI tends to doubt and then minimize, which I hear an echo of in the press release. (In the first major North American event to use SI, the units exhibited problems that eventually turned out to be due to cell phones and due to dry air with electrical storms. A year later, when it happened in Europe, they finally believed and fixed it. Yeah, making systems work perfectly is hard, but their verbiage in the press release is not serving them well. Just say, we discovered and are fixing a bug, like the big tech firms do. Minimizing just makes everyone doubt everything else you say a bit.). But as has been said, given that systems can probably never be relied on to be 100% perfect, it makes sense to use backup means, at least for important events. SI's backup is difficult operationally; perhaps it could be made less onerous, and less delaying of the official results, by some means.

Oct 7, 2017 1:20 AM #

Juffy:

There we go again. The athlete has punched "incorrectly". If only they were more discerning. Then they might have discerned.

Settle down.

The first part of that sentence, which you seem to have skipped over in your valiant attempt to make SI look like they're victim-blaming, is "Because the station erroneously treats the punch as successful, it also gives feedback..."

Oct 7, 2017 7:58 AM #

Settled down. Yes, you are right - I read too much into it. Nonetheless I still don't like the language used.

By definition a punch where the beep beeps and the flash flashes is correct. If that is not recorded it is the technology that is incorrect, not the punch.

There's been quite a bit of "victim blaming" over the years. If you don't have a punch recorded then there cannot have been a beep and a flash. Period.

SI are now saying that this is not currently the case. Given that background I think the statement could have been worded more judiciously.

Oct 7, 2017 10:57 AM #

The story in the finish was.. yes there was a beep and flash, but you punched too quickly. Illogical. We fell for it.

Oct 7, 2017 9:16 PM #

SI are now saying that this is not currently the case
It has never been the case.
About 1 in 10000 times an overwriting fault happens and the card unit is corrupted. That's a pretty low rate and realistically there's not much to be done about it except with flexible rules. pi listed the possible outcomes above.

What they are now saying is that a (new) software error also causes an effect where too-fast punching still generates a flash and beep. They identified their mistake and are fixing it.

Oct 8, 2017 12:12 AM #

origamiguy:

I've mentioned before that it would be good to have a program for a tablet or phone that could use a master station to read a control in the field. That way the control would not have to be brought back to be checked, just have the checker radio back whether the questioned punch was read.

Oct 8, 2017 6:21 AM #

pi:

@graeme "About 1 in 10000 times an overwriting fault happens and the card unit is corrupted."

Source for this?

Oct 8, 2017 9:18 AM #

undy:

@graeme - the rate is orders of magnitude lower or the system would be unusable.

Concern is that the backup system (read the unit) doesn't exist at all when using SIAir. Carrying two cards would be feasible for really importantr events.

Great work by the people chasing down these bug events.

Personally I used to get dq'd more regularly for punching irregularities in the card punch days.

Oct 8, 2017 12:38 PM #

About 1 in 10000 times an overwriting fault happens

If in 25-manna relay we have 350 teams and on average 15 controls per leg it makes 131250 punches. 131250 / 10 000 = 13 bug incidents. 13 teams disqualified would mean 4 % of teams are disqualified of this particular bug. Maybe some may think 1/10000 rate is low, but that disqualified 4% may see it differently.

I wonder why it does not read the slot before it write to it to makes sure it is empty . Marketing issue I guess, would make punching slower and battery drain faster so it would be bad for business.

Oct 8, 2017 1:52 PM #

@Jagge: I agree, an error rate similar to mapping scales (1:10K) would be very close to making the system unusable for any serious event. For 25-Manna the minimum requirement should be that the technical error dsq rate was well below one.

I.e. on the order of 1 in 500K punches would be much closer to acceptable, particularly if this particular error mode was possible to detect during card readout, but I guess the latter only happens when a wild write leaves some visible residue behind.

Re. pre-checking that the slot is empty: This only works if the CLEAR unit zeroes out everything, does it? I've assumed that CLEAR simply initializes the read/write pointers, i.e. just like erasing a file on a computer disk only marks the file as deleted (and normally puts the disk blocks back in the free pool.)

Oct 8, 2017 6:35 PM #

rm:

Clear takes so long that I assumed that it did more than pointers? Maybe it's worth someone in orienteering outside the company gain a deep knowledge of how it works.

Oct 9, 2017 1:42 AM #

My clear is so slow I thought it was reinstalling the firmware at each event.

Oct 9, 2017 4:39 AM #

Juffy:

I've always assumed it was wiping all slots that had been written (according to the pointers) - the first Clear after a run is always slow, if you Clear it again without punching anything then it's faster.

It seems like the logical way to do it in the most efficient manner, too - of course it assumes your pointers are correct, which from this thread does not seem to be a given...

Oct 9, 2017 7:08 AM #

pi:

25-manna has now announced they collected all 260 units from the forest used in the relay on Saturday. They found 2 cases of the overwriting bug.

There were closer to 380 teams: 380 * 25 *15 ~ 140 000 punches.

So this one sample gives an error rate of 1 in 70 000. Still really high in my opinion.

I really was intrigued by graeme's post earlier. It reads as if the overwriting bug is old news, something that has been officially acknowledged by SI already in the past. That their recent statement is for some other new bug. Any sources for this?

Oct 9, 2017 12:16 PM #

@pi@undy@graeme "About 1 in 10000 times an overwriting fault happens and the card unit is corrupted." Source for this?
Personal experience as organiser and controller. I should have added that I meant 1 in 10000 runs, not punches. These are cases where the system inexplicably failed in one way or another, not specifically this new bug, so saying "overwriting" was maybe sloppy. A few cases are very clear from TV coverage etc. (e.g. Hollie Orr at WOC) and I can think of a handful of other cases where I'm inclined to believe the athlete (including WCOC2014 when it was me!). I pay attention to maybe 20 events a year, so my knowledge is of about one fail per year.

Overwriting is old news. At least 10 years ago our JK-relay runner was missing something like nine consecutive controls after the radio (and in a relay pack was seen punching at all of them). He was reinstated, obviously, and (I think they checked that) the punches were recorded in the boxes. Eventually SI identified the problem as moving the pointer back, which made sense.

1:10000 is rare compared with compass, control-placement ankle, knee, Emit etc. "malfunctions". Not to mention brain ones.

Oct 9, 2017 1:27 PM #

It seems like graeme's estimate of an 1:10k error rate (for full races) are in the same ball park as the 2 errors detected from 25Manna which had 380x25 = 9500 runners. (This would obviously also depend on the average number of controls in each race.)

I.e. with an expected error rate of 1:10K an event with 10K runners should see zero, one or two such bugs with approximately equal probability.

Oct 9, 2017 1:59 PM #

rm:

The problem isn't so much that the system has a small error rate; it's that the backup system involves running into the woods to check the units, while the awards are delayed, say, an hour. Or you just give the award to someone, and fix it later. And the extra effort of downloading the units to check. And the head scratching to figure out whether the orienteer punched too fast, or maybe there was a beep and flash.

Reducing the effort (and arcane knowledge) involved in the backup (and frankly primary) system seems to be the real upshot.

Oct 9, 2017 3:22 PM #

Cristina:

I'm appreciating more and more the simplicity of a little rectangle of paper that gets imprinted with holes as you punch.

Oct 9, 2017 3:36 PM #

@Cristina Error rate was ~1 in 100.
(Based on controller having to typically check 4-5 punches per event. Those which the checking team couldn't evaluate, and interminable arguments about whether all the punches had to be in/out of the box/card, dents vs full penetration, punches taken out of order etc. etc.)

Personally, I prefer "touch-free honesty punching". After all, for most events, who are you cheating but yourself?

Oct 9, 2017 4:57 PM #

Cristina:

I was thinking of the Emit backup system...

Oct 9, 2017 5:33 PM #

And no, it does not fall off. Because you tape it.

Oct 9, 2017 10:56 PM #

ndobbs:

how many events can you run with your emit before the battery dies?

Oct 10, 2017 4:52 AM #

Good question. Battery in my vintage one (not in active use any more) from late 90s hasn't yet died. Some more recent ones seem to die after one year. Battery quality seem to vary. I do freezer test once a year, if it works despite freezing there is hope for one more season. But better tape back up paper anyway, you never know.

Oct 10, 2017 8:44 AM #

Ah the emit back up system. "Well done, you did the course correctly, but we have no idea how long it took, so we're listing you as DQ." I don't own an Emit, but have three personal experiences of this with hired units in less than 100 events.

Oct 10, 2017 9:29 AM #

The clock running in emit card is not used for timing. It does not matter if the card dead or not, you get time. It is just punching system and clock is used only for splits.

Here if you finish with dead card they type you estimated finish time at first and if punches are OK they check correct time from video.

If you use it for timing you are using it wrong. Such method is allowed here only for training sessions. Plenty of reasons, for example clock speed vary across cards, if you have slow clock you may get seconds advantage over rivals. You got to use same clock for all competitors. You can't blame the system if you misuse it.

Oct 10, 2017 9:47 AM #

Reading for graeme http://www.suunnistusliitto.fi/kilpailu/emit-ja-tu...

Oct 10, 2017 11:17 AM #

@Jagge. You need to tell the organisers, not the competitors. If I can't get reinstated despite a correct set of pin punches that's not me misusing the system.
I even tried it once, only to be told "Ah, but you can't prove you visited the controls in the right order"

Oct 10, 2017 11:56 AM #

True. I'd say that's what O federation should do. Give guidelines and instructions how to do these things and how not. No matter what system is used really.

At Idre 3-days 2012 I lost my SI card right after first control. I searched it 5 minutes and then went back to #1 and pin punched all the way to the finish. No pin punch at last control and no finish punch for missing card. But I got result.

I know one person being disqualified for no proof of central control of butterfly being visited every time. Pin marks on top of each other could not be identified...

The way it used to be here was not allowing anyone start with a dead Emit card (clear unit at start should see it). So if it dies it dies during the race. And runner can't know it has died (for no feedback), so runner can't know it is dead and can't figure out he could take controls in wrong order, so if pin marks are there we can assume all is good. New feedback versions of Emit gear spoiled this all, of course. Now runner can see it is not working and may either loose time for trying again/harder to punch or even cheat with control order. Damn feedback, cancer of e-punching.

Oct 10, 2017 12:02 PM #

pi:

"Damn feedback, cancer of e-punching"

+1

Oct 10, 2017 12:30 PM #

At Idre 3-days 2012...
Same happened to me at a French 5-day event in about 2001. Except I got reinstated with the finish time of "when they reinstated me" - a result of just over 10 hours! I had fun trying to get out of last place, which I managed on the final day :) As you can see, it turned out to be much more memorable/anecdotal fun than some mid-pack finish would have been.

Just thought I'd quote Ricky from above, since he's so right.
The issue is not with the technology - it is with the rules and their interpretation, or lack of.
Also, I would say that SOFT/25-manna seems exemplary. Technical fault, reinstated on backup, fully explained with feedback given to SI to find a bug and improve the system. Compared with IoF "computer says no".

Oct 10, 2017 1:07 PM #

@graeme: It is pretty obvious that any system can be abused, but failing to reinstate a runner with dead EMIT card and a full set of backup punches is just brain dead. Like I said I have personally had two EMIT cards that worked OK before the event but failed during the race, and both times the backup pin holes got me back on the result list after a ~5 min wait in the Red Zone.

I like to remind my Swedish friends about the main difference between Swedish and Norwegian competition rules: At the same time Sweden enacted absolute rules like "3 controls with a pin outside the designated rectangle => DSQ with no appeal possible", Norway went in the opposite direction and said basically:

"If you cannot prove that you have visited the controls correctly you may be DSQed, but the organizer is always free to reinstate any runner if they find it likely that the competitor have actually visited all the controls and not tried to gain an unfair advantage".

I.e. you should never assume you can handle all situations with strict rules, instead you leave as much as possible up to the organizers and/or jury.

Oct 10, 2017 1:36 PM #

@Terje, While I might agree with you, especially about juries, I never found describing the organisers as brain-dead a helpful way towards reinstatement :) I have never organised events with EMIT and my Finnish wasn't up to reading Jagge's link (so maybe I'm brain-dead) - how should they have got the start/finish times?

There was an interesting SIair case at the British Relays this year. The first team to finish had a missing punch at a spectator control. The runner was seen racing through the control by hundreds of people and claimed to have heard flash/beep (not confirmed by others). Jury confirmed the DQ should stand. Would you agree with that decision? Would it matter if the winning margin was 1sec or 1 minute? would you place any weight on the fact that the unit was OK for everyone else until it failed an hour later?

Oct 10, 2017 3:46 PM #

@graeme: Sorry about my "brain-dead" designation, what I'm trying to advocate against is all those who attempt to make exact rules to cover all eventualities, since this starts with the assumption that the rules should be applied by "brain-dead" organizers, i.e. with absolutely no room for "common sense".

(as an aside, I'd like to note that common sense is not very common :-()

I've been a programmer for ~40 years and I recognize the desire to come up with a fixed algorithm for all problems, but experience has shown me that it is always better to leave yourself (or your users) loopholes where they can override the default behavior. I have not taken part in very many events in the UK, but sufficient to notice that you seem to have the Swedish desire for absolute rules.

Re. your final relay example: That racer would almost certainly have been accepted in Norway if witnesses could testify that the racer had tried to punch, i.e. the opposite to what has happened when teams have been DSQed on Tiomila (or Jukola?) for a missed punch on a TV control where the video clearly showed them punching.

Oct 12, 2017 1:36 PM #

Regarding the SI bugs under discussion, I think we're in agreement that there are 2 (at least) different problems. The first, that SI has acknowledged, is the one where beep/flash is occasionally given before the punch is properly recorded in the stick. Offhand that one sounds easy enough to track down and fix with a firmware update.

The second, that Peter L is investigating and Graeme and others have offered statistics on frequency of occurrence, sounds clearly like the "next slot" pointer is being corrupted on rare occasion. My hunch is that this isn't so much a statistical 1 in 10000 type of event but that there is a confounding factor that comes into play. Graeme suggested possibly marginal battery voltage. Maybe so but if it were that simple I would expect to see more than one type of associated failure.

One scenario where I could imagine a corrupt "next slot" pointer occurring is in large events, particularly relays, when a long line of runners arrives at a control and the unit is punched repeatedly with very little time between. A lot depends on the type of hardware and a lot more on how the software is written, but I imagine something like this is far more likely to trigger a corrupt pointer than the lower demand cycle typical of interval start races where large packs are less likely. Groups still happen there but much smaller on the whole. It's hard to tell just from what's in this thread but have all of the reported cases of corrupt pointer happened in large relays when a pack arrived at a control?

Also, it seems to me that the likely trigger scenarios for both of these bugs would be eliminated by moving to SI Air. There if I'm understanding correctly the control unit just broadcasts its information but does not actually manipulate the memory in the stick. Is this correct?

Oct 12, 2017 5:44 PM #

One scenario where I could imagine a corrupt "next slot" pointer occurring is in large events, particularly relays, when a long line of runners arrives at a control and the unit is punched repeatedly with very little time between.
Agreed. But this is also the case when a genuine too-fast/user-error punch is most likely, due to rushing, jostling, hearing other people's beeps etc.

Oct 12, 2017 6:16 PM #

Yes. And one possibility I'm thinking is that the two might not be entirely unrelated. This is speculation, but imagine a "too-fast" error followed rapidly by another stick arriving in the hole. It's conceivable depending on hw/sw that the next slot number from one stick get accidentally written into the next slot pointer of the next stick. This could be one of those situations that's hard to reproduce in the lab if you don't think to test for it. (And maybe even if you do.)

In any event the two errors have different results in the stick memory that are easy to differentiate.

Oct 13, 2017 5:03 PM #

@graeme: I like your suggestion to look at high use rates: There are probably a few caps on the circuit board in order to even out any voltage ripples, and/or in order to drive the radio when a stick is detected in the hole.

Since that radio has be be strong enough to wake up the chip and allow it to first be read out and then to be programmed, it is quite likely that the peak power usage is more than the rechargeable battery can supply continuously. Undervoltage is a known error mode for flash drives so it could also cause some of these errors.

Oct 14, 2017 10:18 PM #

AZ:

I wish I knew more about the workings of the SI. So, not knowing much, but having a background in realtime (fault-tollerant) computing, here are a few ideas that might be able to help track down this problem...

1. At clear, set all stick data to 0.
2. As suggested, use two index pointers
3. Reserve one bit somewhere in the stick as a 'SI error detected' flag (at least)
4. At each punch, compare the two pointers, if they are not equal - fault detected
5. At each punch, check value being overwritten. If not zero - fault detected.
6. Write punch number / code to the 'most reasonable' of the two pointers

More data should be recorded in the unit (or, ideally, in the stick) about the state of the stick & the state of the SI control. Also, if a fault is detected the SI box *could* do something crazy like beeping an alarm sound. This would not be great because the runner would have to repunch, but better than a DQ for most of us.

Perhaps stuff like this is already being done.

Oct 15, 2017 4:22 PM #

@AZ: Among your suggestions, #1 is pretty obvious, and probably already implemented.

All the rest fight against the main desire of the developers of the last 10+ years:
"How do we make it faster?"

On the one hand, if you can make the punching process fast enough (i.e. a few milliseconds), then it would become impossible to push the stick all the way into the hole and retract it again without getting a valid punch.

On the other hand, this would however lead to orienteers discovering that even a partial insertion would almost always be enough, so you would still have a problem with sticks that almost but not quite managed to finalize the process

The key issue is still the well-known programming bug called a "race condition", typically caused by badly interlocked processes: It should by definition be impossible to get both beep & flash before the punching process have been verified as complete.

Oct 16, 2017 12:57 AM #

AZ:

@Terje

I admit I have no knowledge of the technology they are using, but how much slower is this going to make things? If everything is working to me this looks like minimal stuff: read three values from the SI stick instead of one (two pointers plus a time/code value, versus one pointer), and then doing two extra comparisons (that the two pointers match and that the time/code value is zero).

Fast, good, or cheap - pick any two (famous software development choice). I'd rather have good & cheap in this case ;-) Making the SI work more reliably is much more important to me than having it work fast. And to make it work more reliably I suspect they need to put more "defensive coding" into play.

(PS: This problem might be caused by a race condition. But I have always been reminded of one of my impossible-to-find bugs - it took almost three years to find. It was that an interrupt handler didn't properly restore the pre-interrupt state when it exited (one - just one - register was not restored properly). This created very similar symptoms to what we are seeing with SI - almost random and relatively infrequent failures.)

Oct 16, 2017 7:52 AM #

pi:

Pure speculation of course but I would say it's likely that SI already has redundancy strategies in the firmware. It may not be about "coding" at all. There is an electronic circuit board design that has to handle real-time events. There may be components on the board that are slightly out-of-spec, marginal soldering not detected in testing, timing of events and interrupts, uneven or low power supply etc etc. In rare circumstances these type of hardware problems may lead to random or "impossible" undefined behavior.

If we are to believe PL's report SI has not found the root cause of the overwriting bug. In that case it may not help at all to rewrite the firmware.

Oct 16, 2017 8:28 AM #

@AZ: Broken IRQ handler? Oh, I feel for you, those can indeed be horribly hard to find if the interrupt is rare. If I had to guess I would suspect that the register was the flags pseudo-reg, since an error here is the one most likely to not crash the machine more or less at once.

@pi: Your idea here is similar to my previous suggestion that power supply issues could cause these kinds of "impossible" situations. If SI still hasn't located the bug then it cannot be as obvious as an unlocked semaphore update which gates the feedback signals.

Oct 16, 2017 9:40 AM #

AZ:

My (uneducated) suggestion is aimed not at fixing the bug, but rather just at detecting it at the moment it happens. If it can be detected, then more information can be compiled about the circumstances, which might help track it down. As a bonus, if it is detected in real time then, as not ideal as it is, the runner can be alerted to the problem (and perhaps suffer a two or three second penalty for a re-punch which is better than a DQ ;)

PS: I'm happy that this is finally being openly talked about, instead of perpetuating the "you must have punched too fast" story. The way I look at it, accepting there is a problem in the equipment that may from time to time cause unfair results, isn't too much different from having a human referee who can make a game-changing bad call. You have to live with what happened, while at the same time work on improving the referee's future behaviour (eg: with video replay, in the example, or with improved error detection in the case of SI ;-)

Oct 16, 2017 10:08 AM #

AZ:

[One thing I think I'm implying is that the current system of checking that a punch is valid prior to having the SI unit "beep and flash" is not a very thorough check. Perhaps it is just a check that the writing is finished, but with no check that the contents of the stick are correct.]

Oct 16, 2017 3:11 PM #

jjcote:

In my experience, rare errors like this tend to be the result of uninitialized variables that cause trouble only when they happen to contain some particular garbage value by coincidence. But with no knowledge of the inner workings of the SI hardware, my experience is good for nothing but idle speculation.

Oct 16, 2017 4:21 PM #

@Terje re. partial insertion
Here's a quote from (then) world champion Marten Bostrom...

I consider the punching technique a crucial part of becoming a champion. For races utilizing Emit I have developed a technique where I don’t always get a backup pin mark in the piece of cardboard which is attached to the card. I have learnt how long to keep the Emit card at the control unit to get a successful punch, but...
I don't, personally, think this is cheating. But when runners confess to this, then accusing others that "you must have punched too fast" is a bit lame.

Oct 16, 2017 4:57 PM #

Remember there are TWO bugs at issue. One, of the punched-too-fast variety that is erroneously causing beep/flash to occur before data is properly recorded in SI card. The other, where the card's next slot pointer is being corrupted. The firmware rewrite is aimed at bug #1. Bug #2 is not yet acknowledged by SI.

The AZ suggestions to detect Bug 2 and give an error signal, thus allowing the opportunity to punch again I don't see helping. If the pointer is corrupted then either a) you are overwriting previously used slots and a re-punch doesn't help or else b) you've jumped ahead and, well, a re-punch doesn't help). Also, to get a "most reasonable" decision from multiple pointers would be easier with 3 than with 2. (Think various distributed systems that use the concept of "quorum".)

Also, given the likelihood that the bug is being triggered when a train of relay runners descend on a control and punch it repeatedly as fast as they can get near it, what exactly would you expect them to do if the control suddenly started emitting an error signal during the punching melee? Will they even know who triggered it?

Oct 16, 2017 5:42 PM #

AZ:

I think you are mis-interpretting my logic. My point is for SI to collect more data about when the error occurs, to help in fixing the bug. My suggestion for an error alarm and repunch is not the main objective.

(and also for the SI stick to have an 'error bit' set in it, so that at download the runner can be identified as likely having suffered an SI bug. Wouldn't that be cool? :-)

But, yes, the repunch is probably not going to work ;-)

(also, I question the contention that the bug is being triggered during a train of relay runner - is that shown to be the case?)

Oct 16, 2017 9:50 PM #

AZ:

One, of the punched-too-fast variety that is erroneously causing beep/flash to occur before data is properly recorded in SI card. The other, where the card's next slot pointer is being corrupted. The firmware rewrite is aimed at bug #1. Bug #2 is not yet acknowledged by SI.

Exactly! And this is really disappointing. I suspect Bug2 is much more common, and if not more common then at least way more unacceptable IMHO.

Oct 17, 2017 11:49 PM #

Something happened at my second control last weekend. I visited the site (gps confirmation). In fact, it was a strange loop course and I had to visit the control a second time and I did so partly from memory of the first visit. I distinctly remember getting the beep feedback (though I wasn't watching for a flash). The result wasn't recorded in my stick. Mispunch result. I interrogated the box that night and found no sign of my first visit, though the second was happily there. There was one unusual complication. Most boxes had not been advanced to daylight savings, but the computer and the previous box had been. And I started early which might have placed my visits before the zero time in standard time reckoning. This is my second experience of these sorts of strange mispunches. Losing confidence.

Oct 18, 2017 1:35 AM #

rm:

But with no knowledge of the inner workings of the SI hardware, my experience is good for nothing but idle speculation.

This is at the essence. One either accepts the assurances of the vendor, or not, pretty much. If we want a system that can be independently verified, we'd need volunteers to create an open source system (firmware, hardware, software). I doubt that there's sufficient interest in that. (Admittedly, Open Orienteering Mapper and Purple Pen have turned out well, but a high reliability real time system for punching, splits and timing may be more work than available for free.)

Oct 18, 2017 7:16 AM #

Maybe.. but I know of at least one independent non commercial timing system.

Oct 18, 2017 7:17 AM #

@JimBaker: There was a demonstration of a new (i.e. third) system during WOC2016 in Sweden.

@graeme: Re. your Boström quote: Yes, being able to punch quickly is of course important, at least for sprint races, but Middle often comes down to a single second or two as well.

The key is that if you try to develop an EMIT technique that doesn't include getting the backup pin, i.e. not really pushing your brick all the way down, they you are making an intentional bet, and you know about it.

On the same Norwegian champs which my club hosted in 2012 we had (by far!) the most DSQs in H19-20, i.e. the oldest juniors. This was due to several of these guys doing the same "quick punch" bet, and they knew they were doing it, so when I told them they were missing one or two controls but nearly half the backup pin holes they accepted the DSQ status immediately.

Oct 18, 2017 6:38 PM #

AI-aka-nerimka:

AFAIK batteries for SI stations should provide high peek current. It's key specification, SI station works not continuously (like watch or headlamp), but in pulses.
I think that organizers of affected events should provide history of affected SI stations. (Brand , type and working hours/punches since installed).
If this is a case, SI station firmware needs to be adjusted, to test battery i.e. in few different modes, for big, middle and small events. Certain SI station can withstand 10 small events, but can be prone to one big relay.

Oct 18, 2017 6:43 PM #

rm:

I recall Emit techniques that disregarded the paper card entirely, tapping the end of the Emit in the hand on the middle of the Emit on the stand. With Emit units that have an LED that flashes upon a punch, that sufficed. (But, of course, there's no backup for a failure of the kind where a flash occurs but a punch is not recorded. I don't know whether those have happened with Emit. Also doesn't protect in the case of the Emit battery failing during the event.) Some local events with Emit dispensed with paper backup entirely (but, of course, it's not so serious if a local event result is lost).

Oct 18, 2017 9:04 PM #

jjcote:

It's an unfortunate situation we're in, where there's such an issue about something that has nothing at all to do with the essence of the sport. Orienteering is about reading a map and moving quickly through terrain, not about punching.

Oct 18, 2017 9:12 PM #

bmay:

something that has nothing at all to do with the essence of the sport

I guess we should just use an honour system where each athlete is responsible for touching the flag. And, time themselves on their wrist-watch. That ought to work :-).

It seems to me that "proving you've been to the controls" and "determining how long it took to do the course" are fairly fundamental.

Oct 19, 2017 12:27 AM #

rm:

I'm fine with most events being honor system and self time if at all. Most of the time it's just exercise, training, something to do on the weekend. Occasionally it's a championship or such. I think that we time way too much (and have simple training way too infrequently). But I lost that discussion with my local club, who want to keep timing every local event, even though too few people want to volunteer for it.

Oct 19, 2017 12:30 AM #

That refrain sounds familiar. The solution (?) is to let the community find its equilibrium. Those that love the electronic timing have the motivation to do the work. When they reach their limit, and others don't volunteer, you have found the happy mix of training and timed events.

Oct 19, 2017 5:11 AM #

After describing my "mispunch" experience on Saturday on this thread, I received a pleasant email from Simon at SI asking for the following-
"If possible, I would be really interested in a readout of the station backup (standard CSV file from SIConfig+ is fine) *plus* a download of your chip – both as a human readable format (CSV or PDF of the chip download) AND the “raw binary data” (as can be exported from SIConfig+ in the “read cards” function."
It might be a good practice to save and send this whenever one of these 1:10+k probability incidents occurs.

Oct 19, 2017 5:28 AM #

gruver:

Jim wrote: I think that we time way too much (and have simple training way too infrequently)

Agree. www.mapsport.co.nz/umax

Agree with Log as well. If you feel obligated to fill that gap but would rather not: DON'T

Oct 19, 2017 8:44 AM #

Split times are so last decade anyway. Pin punching with gps tracks would be much more interesting for post race analysis. And you would not need to check punch cards, just archive and publish photos of them. And check only if something is suspected.

Oct 19, 2017 6:05 PM #

@bmay "proving you've been to the controls" fairly fundamental.

So it would be nice to have some consistent rules about how to do that.

With emit, even if you admit to not punching correctly, but there's some evidence you were there, its fine (e.g. Bostrom). With SI, even if you're recorded in the box, or on TV punching correctly, you get DQ'ed (e.g. Orr).

Yes, there are rare technical glitches, but the main problem is this inconsistency in the rules.

Oct 19, 2017 8:07 PM #

That "some evidence" is the normal electronic punch mark.

With both systems you either return from forest with some kind of punch mark in your card and you are OK, or without and you are disqualified. It is simple as that really, I don't see any inconsistency there.

It would be inconsistent to qualify lucky ones who punch too fast in front of TV camera and disqualify unlucky ones who do the same outside TV coverage.

i.e. I fail too see how the main problem is inconsistency in the rules. I'd say the the main problem is constant rate of technical glitches affecting a system with no backup (or difficult to use backup). No-brainer really.

Oct 23, 2017 8:13 PM #

email from SI last night-

" this looks very much like the error that we already know of – the card is missing a punch because the “punch-pointer” was not written by the card even though the station instructed it to do so, and therefore the next control overwrote the punch. (Yes, of course we could “read before write”, and very likely will do so in the future, the errors we are seeing are new and really rather mysterious to us too)."

So the pointer issue is already understood and being worked on. Something else is going on. I am glad I don't have to debug this.

Jul 31, 2018 10:00 AM #

For this bug O-Ringen had this year 200 SEK protest fee, those who liked to get forest unit to be examined had to pay 200 SEK. If missing punch was found there the fee was returned. Apparently this fee was to reduce amount of checking - less unnecessary inspections. Based on post in orienterare nu there was several overwrite bug cases every day, mark missing, protest made, punch found from forest unit and fee returned. Most likely there still is people who are not aware of this or for other reasons bothered not to make protest, so all bug cases hardly were found.

From "O-Ringen programtidining":
"Protester mot tävlingsledningens beslut vid
anmälan om regelöverträdelse ska lämnas
skriftligen till Röd utgång i måltältet senast
två timmar efter presenterat beslut.
Protestavgift 200 SEK, återfås om
protesten godkänns."

Aug 1, 2018 1:47 AM #

I suspect this would have created interesting times at the "Problems" desk.

Aug 1, 2018 6:40 AM #

pi:

Any news from SI? Did they rewrite the firmware now?

Aug 1, 2018 12:49 PM #

gruver:

The World MTBO Champs is equipping all riders with a second SIAC, which will be used in "exceptional circumstances". But this may be something to do with air punching or high speeds rather than the bug. The stations used have a range of 1.8m.

Aug 1, 2018 1:20 PM #