Engineers unravel the mystery behind periods disappearing from email text
In written language such as email, periods, represented by '.', play an important role in indicating the end of a sentence and separating numbers. Engineer
The curious case of the missing period - Tjaart's Substack
https://tjaart.substack.com/p/the-curious-case-of-the-missing-period
In 2016, Tjaart was working on a project to build a solution for a client that would allow them to consolidate all their document templates into one system. The solution they developed allowed the client to centrally manage all templates used to create PDF documents, text messages, and email bodies.
A few months after the solution was released, Tjaart and his team received a report from one of the users of the system that 'one of the emails they sent to a customer was missing a period in the body of the email.' They also reported that 'the missing period only occurred to a specific customer, and the same email did not miss a period when sent to another customer.'
Below is an example of a correctly sent email, with a period immediately following 'family.'
On the other hand, in the image below, the period immediately after 'family' is missing.
When Tjaart looked at the source code of the email, he found that the missing period was in fact included in the source code. Tjaart quickly copied the source code of the template used in production into a local version of the software and created a preview of the email body. The missing period was not visible in the preview, and the period was not missing when he printed out the body template.
Tjaart and his team were scratching their heads, but when they changed the placeholder in the code to the name of the customer who was allegedly 'missing a period,' they confirmed that the period was indeed missing, as reported.
As a result, it was discovered that when emails with certain content were sent to certain customers, periods would be missing. Further investigation revealed that when the position of a period was changed within the template, the period would reappear in the body of the email.
Tjaar and his team began debugging the code, stepping through the code that creates an email and saves it to a local database. In this code, an email is added to the database as 'to be sent,' and then a Cron job is included that periodically picks up emails that need to be sent and sends them.
The problem this time was focused on the code that was being called by this Cron job, part of which was an implementation of an SMTP client that was part of a project that had been carried out prior to this solution.
When Tjaar and his team looked at the code, they discovered that one of its functions ensures that each line in the body of the email does not exceed a certain number of characters: if a line in the body exceeds the character limit, a new line is created and the remaining text is moved to it.
The SMTP specifications state that a line of text containing the CRLF command can be up to 1000 characters long, and that a leading dot does not count towards the 1000-character limit.
By running the code multiple times, Tjaar discovered that the next line of the sentence where the period was missing actually started with a period, indicating that the previous line exceeded a certain number of characters, creating a new line and thus moving the period to the beginning of the sentence on the next line.
In the example below, the period appears correctly immediately after 'family.'
However, when I format the body of the email using a custom SMTP client, a period appears at the beginning of the fifth line of the email, and the period immediately following 'family' is missing.
When Tjaar checked the specifications of the SMTP client and the SMTP server, he found that for the SMTP client, 'Before sending a line of the email body, the SMTP client checks the first character of the line. If it is a period, an additional period is inserted at the beginning of the line,' and for the SMTP server, 'If the first character is a period and there are other characters on the line, the first character is removed.'
Tjaar and his team immediately implemented a fix that stated, 'Even if the SMTP server removes the period at the beginning of a line, the period is not actually removed,' and reported to users that the problem was resolved.
The issue was seemingly resolved with the application of a patch, but other users subsequently reported that 'for some users, decimal points were missing from the emails informing them of their monthly insurance premiums.'
Below is an example of an email with the decimal point correctly displayed.
In the email below, '$27.00' is displayed as '$2700.'
However, based on their previous experience, Tjaar and his team quickly identified the cause of the bug as 'the length of each line in the email body.' When the customer's name or other input was a certain number of characters long, the period was moved to the first character of the next line and deleted. A patch was immediately applied to the code, and the problem was resolved.
Related Posts:
in Software, Posted by log1r_ut