Category Archives: Encoding

This is Hi-Fi

I apologise in advance for this rather extreme short essay. But it occurred to me that the processes music goes through in order to be reproduced via LP records would be in the realm of sado-masochistic novel writing if they had not already been invented.

This is meant to be comedy. But it’s not far from the truth!

This is Hi-Fi

So you’ve made this lush production, full-range sound, lots of dynamic range, in stereo. Perfectly edited, vivid, convincing audio with no noise or interruptions, and little distortion.

Oh, you’re filtering off everything below 40Hz, and mixing everything below 90Hz into mono, and dropping things above 16kHz? What on earth for? Double-basses are meant to be on the right of the orchestra aren’t they?

Now you’re going to change the frequency response of the whole thing quite madly so the high frequencies sear your ears while the low end is lost in hum? That sounds dreadful, to be honest.

You want to send it into a microscopic mechanical transducer?

But you’re not going to couple that transducer to another transducer with a rigid material, but instead attach it to a pointy thing and make it scratch its shape on a spinning piece of brittle plastic? And the plastic doesn’t stay at a constant speed, but starts around 50cm/sec but later slows down to below the speed of professional tape, maybe about 20cm/sec? And you have to put a vacuum cleaner behind the pointy sharp thing that carves so the bits it scratches away are sucked up? And the plastic doesn’t stay where it was put? And there’s dust in the room? And the plastic is affected by temperature?

Now you’re going to move that plastic into a chemical vat that looks more like a sheep-dip container than a concert hall, and you’re growing a layer of metal on it? Did I see you physically break the metal off the plastic? And you’re growing another layer on that, breaking it off like you burst open a bubble-pack; and you’re growing yet another layer on that? And after all that mess and bending and breakage, you’re squirting consumer-grade thermoplastic onto that and crushing it with the weight of a bus in the hope it might take up the same shape?

Are you sure that hole is exactly in the middle?

That thermoplastic isn’t really very hard, is it? But you say you want to take it through the open air into an average domestic room full of dust, and risk handling it outside any protective wrapping, with bare hands? What’s that platter you’re putting it on? Are you sure it’s clean?

Do you think it reasonable to force this malleable, soft, damageable, melt-prone plastic past a piece of incredibly-hard diamond kept in place by bits of rubber led by gravity and inertia, together with corrections that are almost impossibly small to determine? How on earth is this not going to damage the delicate plastic? The diamond isn’t even the same shape as the pointy thing that cut the hard plastic in the first place. And it isn’t meeting the slash in the plastic at quite the same angle the pointy thing made the cut in the harder plastic? And you balance the little diamond so carefully that it nearly but doesn’t leap out of the slash in the plastic, but it’s still deforming the plastic through friction because it’s so hard, and you’re slamming it into walls of soft plastic up to 20,000 times each second? What if you accidentally knock the little diamond?

Now you’re going to attach the diamond to another transducer the price of a house, and yet so delicate that the capacitance of the connecting cable affects the signal badly? And you’re going to roll off the top, boost the bottom so it rumbles, and amplify THAT so loudly that you can hear it but not so loudly that it, too, rattles the little diamond?

And THAT’s Hi-Fi?

Is home audio moving forward?

The other day, a friend asked a Facebook group how developments in domestic audio might shape up in the future.

Nothing, not even Dolby Atmos, beats the revelation I experienced, in the early 1980s, of

  1. being able to leave behind the distortion and interruptions of vinyl by using CD or tape; and
  2. being able to build better loudspeakers.

The story goes back fifty years, at the time of writing. In 1967, my father bolted a rescued Garrard AT6Garrard AT6, the autochanger based on the SP25, into a polished brown radiogram cabinet and wired the crystal cartridge to the input of the built-in valve radio/amplifier. The cabinet had a richly toned single speaker underneath, and this system of glowing radio dial and filaments played his classical LPs, the 7-inch singles my mother inherited from her DJ brother, and my children’s records, together with some shellac 78s from second-hand shops and relatives, played by turning the crystal cartridge over in its headshell.

Amid the clunks and whirs of the mechanism, I was hooked on music of all kinds, Radiogram circa 1955and speech recordings too from companies like Saga and Delysé. On command, even a child could make dramatic magic happen in that warm-sounding loudspeaker. The amp with its lethal HT anode kicks was IC10replaced in 1969 by a black box into which my father had installed a Sinclair IC10-based 10W amp, which lasted through the replacement of the cartridge with a ceramic item in 1971, then he built (with me holding the circuit-boards steady sometimes) the Practical Electronics Gemini stereo amp set in 1972. In the years of mass-market stereo LPs on “Music for Pleasure” among others, we played this amp into mis-matched speakers rescued from Army surplus shops. Gemini preamp“But it was a stereo hi-fi to US” (to misquote Eric Idle), and the leap forward into “sound sculpted in space” was never regretted.

A Goldring magnetic cartridge followed, and a tuner, and other improvements here and there; but, as a musician-in-training, even when young, it maddened me that the music didn’t sound like it did on the radio or in the schoolroom when we played. There were ticks and pops that you didn’t get from instruments nor did you hear radio3-1967them on the Radio 3 concerts. Sometimes the pitch wavered slowly. Nearly always the sound had a ‘fuzz’ with it toward the end of a side, particularly on French horns, muted brass and sopranos. None of this happened when I taped a cassette off the radio, and it was a crying shame that all recorded music for the masses had this gauze of dirt, this veil in front of it. I learned where every scratch was in the quiet passages of the symphonies and chamber music on the shelf, and was almost surprised when those noises didn’t occur where expected on radio performances. My parents didn’t mind the interruptions, but I knew this rubbish was not music, even though the frequency response ran smoothly from bottom to top and the amplifier’s distortion was almost below measurement. And it was sad that one poorly set-up pickup or arm could damage a precious recording for ever.

Later, in my early teens, the record-making process was unveiled to me, and it seemed strange that good tape copies of the masters were not sold to music-lovers. ella-cassette“Musicassettes” too often sounded muddy, though we rescued a reel-to-reel deck to play home-recorded tapes well; but no decent tapes were available unless recorded from Radio 3 concerts or the Big Band shows on Radio 2 which were superbly presented. As you can tell, I had no idea of the economics of producing tape versus vinyl LPs.

But soon after that I was engineering or producing my own student recordings of good concerts or bands in our departmental studio at Surrey University; and almost simultaneously with my leaving home, 1982 Sony CD playerthe CD came along. Teenagers (as I was then) can tell where 20kHz brickwall filters harm music, but at last, at long last, the music was almost completely pure. It did not waver or wobble. It was not interrupted by ticks and pops, nor by fuzz. Its quality was identical from beginning to end. There was no mourning that the beautiful passages of “En Saga” were harmed by being close to the end of the LP side. There was silence between the tracks or in the rests. Just like in the concert hall or recording session. And, later, it became clear that the recorded sound did not need to be sanitized (albeit, in the hands of a mastering engineer, very sympathetically and musically indeed) to survive the transition from tape to vinyl groove. After this, all else was candlelight. We didn’t have gas in the village where I grew up. (Note to the youngest readers: find Karajan’s statement on digital audio.)

Apart from the gradual increase in amplifier efficiency and very occasional leaps forward in speaker technology (my main speakers are twenty years old, though a better sub was added recently), to answer the original question, I have heard nothing since the advent of interruption-free recordings, whether digital or analogue, that improves my enjoyment of music or drama, except for one thing. The ability to compress the music to suit my listening environment is my primary nod to convenience. Where necessary, my in-car or ‘party’ music on memory sticks has broadcast-style processing added so I never touch the volume control anywhere on the road or while people are chatting over the guacamole (home made) and Cava.

What’s left? Convenience; curation; accessibility, discovery; that’s all, really — and a means of paying musicians properly, of course.

As I become older, one thing intrigues me. If I become thoroughly deaf, and need a cochlear implant, could I tune my computer, as a musical instrument, to the frequency channels and be able to hear (or compose) music arranged especially for those channels? Could that be a thing? Can there be music composed or arranged specifically to be heard at its best through the limited-pitch channels of a cochlear implant, so that permanently and profoundly deaf people might choose to try experiencing music in this way too?

I’d like to thank Mike Brown, very senior audio engineer and radio presenter, for asking the question that provoked this ramble. His website is, and you can read about his many regular radio shows here:

Illustrations are not mine; but are similar items seen on eBay or other websites

Metadata for Culture and Heritage

As part of my efforts with the ICOMOS-UK digital committee, I’ve started to collect metadata specifications relevant to heritage and culture.

My aim is to produce a superset with copious documentation and guides to subsets, so that all data is interchangeable. After all, if the Digital Production Partnership can join European and North American delivery standards for television in this way, isn’t anything possible?

Work begins at the link below. Suggestions are most welcome.

Audiovisual Archive Metadata & Preservation

Much faster Avid ingest from any format

The venerable FFmpeg audio/video tool can now package its output in Avid Op-Atom format directly, without always needing to have its output wrapped by the raw2bmx tool. This method is very fast and, crucially, can be used on any computer; not just the machine with your Avid licence. However, certain features of the Avid Op-Atom MXF wrapper are either not yet tested, or not available. For these features, I still advise using the bmxlib suite.

The advantage of this method is that video and audio data is very quickly imported into your Avid, at the full rate that the FFmpeg encoder can manage. Furthermore, your Avid will be using its native formats (e.g. DNxHD), rather than converting, say, XAVC on-the-fly with the AMA functions. The disadvantage is that metadata is quite messy, and lacking certain elements altogether, until I’ve figured out the full MXF Op-Atom metadata tags. Particularly, audio and video tracks are not linked into a single multi-track clip in an Avid bin: you must synchronise them yourself.

So, in this article, I will show you how to take video and/or audio from any format that FFmpeg will read, and output immediately MXF files in Avid-friendly codecs that Avid Media Composer’s “Media Tool” will pick up for you.

This post shows my earliest tests. I have not yet fully explored how to link files, or add other attributes, that raw2bmx adds or, indeed, Avid’s own import and capture tools add. However, the advantage of not using raw2bmx is a much faster import process.

Here, we start with a file of any format with a mono soundtrack, and output Avid DNxHD files at a resolution of 1280×720 and a bit-rate of 90Mbit/s, straight into the Avid MediaFiles directory, and ready for editing. This workflow assumes the frame-rate is for UK television, at 25fps, progressive encoding. I have not attempted to detect the number of audio channels in use, and therefore they are not encoded separately. This is, however, easy to achieve by using FFmpeg’s “asplit” audio filter and is detailed on the FFmpeg website.

ffmpeg -i "MY_CLIP.f4v" -vf scale=1280:720:lanczos -an -metadata project="MY PROJECT" -metadata material_package_name="MY CLIP" -b:v 90M -f mxf_opatom "M:\Avid MediaFiles\MXF\1\MY_CLIP_v1.mxf" -vn -metadata project="MY PROJECT" -ac 1 -ar 48000 -metadata material_package_name="MY CLIP" -f mxf_opatom "M:\Avid MediaFiles\MXF\1\MY_CLIP_a1.mxf"

Here is a break-down of that command line:

-i "MY_CLIP.f4v"

Here is the incoming clip.

-vf scale=1280:720:lanczos

We start with producing an output video MXF. Here, we ensure that the video is resized to the particular flavour of HD we’re editing in. In this case, we’re using 720p, and resizing using the algorithm I consider to be the best.


The video output can contain only one track: video, in this case. This command instructs that the output must contain no audio. (Literally: “audio, none”)

-metadata project="MY PROJECT"

Here, we embed into the MXF metadata a value for “project”. This corresponds to your Avid project name. It is true that, upon analysing Avid’s own MXF files, the project name is contained within the metadata tag “project_name”, but my shorter tag appears also to work.

-metadata material_package_name="MY CLIP"

This is the name of your clip, as it will appear in your Avid bins and in the Media Tool.

-b:v 90M

Here, set the bit-rate. Using FFmpeg’s built-in DNxHD encoder, you can choose from several bit-rates, which FFmpeg’s error messages will be happy to tell you about if you set this wrongly. We don’t explicitly set the encoder itself, because FFmpeg does that for you: its default for this muxer is DNxHD.

-f mxf_opatom

This explicitly instructs FFmpeg to wrap your data in an Op-Atom MXF wrapper, ready for your Avid Media Composer to use.

"M:\Avid MediaFiles\MXF\1\MY_CLIP_v1.mxf"

Finally, for the video file, here is the output file. In this case, I’m putting it straight into Avid’s media file storage area on my ‘M’ drive, for media. There is a "_v1" suffix so that the file is marked as a video file, for my own ease of comprehension.


After the video output file is named, we start listing the options for encoding the audio file. This first option instructs FFmpeg to produce an audio-only MXF file, without a video track. (Literally: “video, none”)

-metadata project="MY PROJECT"

As with the video file, we embed into the metadata the name of the Avid project that needs this file.

-metadata material_package_name="MY CLIP"

As above, this is the clip name as it will appear in your Avid bins or Media Tool.

-ac 1

This is a quick-and-dirty kludge to mix down all the incoming audio tracks to a single track. In real life, you’ll want to use FFmpeg’s filter “asplit” to handle each incoming audio track separately. At the moment, this command line produces only a single audio file, mixing together all incoming tracks.

-ar 48000

With this command, we convert the sample-rate of the incoming audio to the standard sampling rate for television: 48,000 samples per second. Avid can, of course, convert sample rates on-the-fly while editing, but it is better to perform this work at the import stage to give Avid less to do when editing. Again, the codec itself (pcm_s16le, meaning linear PCM, 16-bit, little-endian) is not explicitly specified because FFmpeg sets this as the default for Avid import.

-f mxf_opatom

As before, this explicitly instructs FFmpeg to wrap your data in an Op-Atom MXF wrapper, ready for your Avid Media Composer to use.

"M:\Avid MediaFiles\MXF\1\MY_CLIP_a1.mxf"

Here is the filename for the audio output. In this example, I have placed it in the same directory as the video output created earlier in this command line. It is suffixed with "_a1" for my own ease of comprehension.

There are additional options associated with this FFmpeg muxer, which are listed here. I have not experimented with these, but can see that the -mxf_audio_edit_rate might need to be adjusted for non-European television or film work e.g. 30000/1001 for American television work, or 24 or 24000/1001 for 24fps film work. Also, you would probably want to set the -signal_standard bt601 or -signal_standard 1 for standard definition television work.

Muxer mxf_opatom [MXF (Material eXchange Format) Operational Pattern Atom]:

Common extensions:
Mime type:
Default video codec:
Default audio codec:

MXF-OPAtom muxer AVOptions:

Audio edit rate for timecode (from 0 to INT_MAX) (default 25/1)
Force/set Sigal Standard (from -1 to 7) (default -1)
ITU-R BT.601 and BT.656, also SMPTE 125M (525 and 625 line interlaced)
ITU-R BT.1358 and ITU-R BT.799-3, also SMPTE 293M (525 and 625 line progressive)
SMPTE 347M (540 Mbps mappings)
SMPTE 274M (1125 line)
SMPTE 296M (750 line progressive)
SMPTE 349M (1485 Mbps mappings)

Showing Stream Structure with FFmpeg

Ever wondered what the structure of your H.264 or other motion-predicted video stream is? With FFmpeg and a Unix (e.g. Linux, BSD, Cygwin) command line, you can find out.

ffprobe -show_frames -select_streams v:0 YOURFILE.EXT | grep pict_type | sed s/pict_type=// | tr -d '\n'

The output is something like this:


Open Source MPEG2 Video Encoding for DVD

An open-source codec project, x262, is attempting to bring the best of x264’s coding techniques to the encoding of MPEG2 video. X264 is a very popular, world-class open-source H.264 encoder.

The open-source multimedia utility FFmpeg has its own MPEG2 video encoder, but its quality falls far behind that of x264. However, whilst it is possible to incorporate x262 within FFmpeg, one then loses the ability to compile-in the most up-to-date x264 encoder.

This is because the x262 project generates a library and a commmand-line utility that replaces x264, and extends its capabilities. Unfortunately, x262 isn’t tracking the latest x264 encoder right now, so it is best compiled as a separate utility.

In my own compilation of FFmpeg and associated utilities, I have kept x262 and x264 separate, so this blog post will show how to encode for DVD using x262 and FFmpeg, fully retaining FFmpeg’s x264 capabilities, and encoding MPEG2 video at greater quality than FFmpeg’s native encoder for this.

Here is the command-line, as set up for a 25FPS production converting the video from a 24fps cinema file. This is for video only. You’ll use a separate command line for audio, then combine the two files, again using FFmpeg.

To explain:

-r 25
Tell FFmpeg to interpret the file as 25fps, so we get the 24->25 speed up necessary when showing a cinema film on European television.
-vf scale=720:576:lanczos,smartblur=1.0:-0.4,colormatrix=bt709:bt601,setdar=16/9,setsar=64/45
This is a video filter, and it does quite a lot. First, we scale the film to PAL DVD spec, that is 720 x 576 pixels, and we use the lanczos algorithm for best quality. Then, using FFmpeg’s smartblur algorithm, we add a little inverse blur, to sharpen the image without increasing noise too much. Next, the colour matrix is converted, because we’re coming out of Rec.709 colourspace (standard for HD television), and going into Rec.601 colourspace, for standard definition television. Finally come two filters that signal the display aspect ratio, and the sample aspect ratio. These are the standard widescreen aspect ratios for the screen and each pixel in 576i television.
This ensures that there is no attempt to process audio.
-pix_fmt yuv420p -f yuv4mpegpipe - |
The video’s pixel format is changed to yuv420 if it was not in this format to begin with, and we pipe the video out using the yuv4mpeg format, which carries a simple header to instruct the program at the end of the pipe to interpret the video correctly.
x262 --fps 25 --demuxer y4m --mpeg2
These set up the x262 encoder to interpret the pipe’s input correctly using the y4m format, and insist that the frame-rate is 25fps. Then, we instruct x262 to behave as an MPEG2 encoder. This is necessary because the x262 binary also contains an H.264 encoder.
--preset placebo
This preset sets up some of x262’s slowest, and most careful, encoding options. It slows the encoder down to around 7fps on an ancient laptop (the test machine here), but this is not a particular problem for my purpose.
By permitting each group of pictures to contain B-frames that can refer to P-frames outside their own GOP, there is a slightly increased efficiency of encoding.
--tune film
This sets some tunings that result in less apparent distortion to the picture when encoding from a film or film-like source
--keyint 12
DVD specification for PAL discs requires that each group-of-pictures be 15 frames long or less. This parameter ensures this is the case. GOPs can be much longer, but discs encoded with GOPs of more than 12 frames might not play on all players.
Interlaced encoding is less efficient than progressive encoding. But if your incoming source is progressive, you can instruct the encoder in this way to encode it as if it were progressive footage, but still signal that it is interlaced in order to stay within what most DVD players expect.
--vbv-maxrate 8800 --vbv-bufsize 1835 --crf 1
Here is our bit-rate control. DVDs must never exceed 9,800kbit/s, so we allow up to 1,000kbit/s for audio and other overheads. Then we allow buffering up to 1,835kbits, which is the DVD player specification; and finally instruct the encoder to encode using the highest quality variable bit-rate within these parmeters.
--range tv --colorprim bt470bg --transfer bt470bg --colormatrix bt470bg
These define, and signal in the encoder’s output bitstream, the colour parameters of the video. In particular, these state that the signal’s range is that used for television (16-235 for luminance (Y), 16-240 for Cr and Cb for in 8-bit systems), and that the colour descriptions fit the Rec.601 standard, which is an update of BT.470BG
--sar 16:9
SAR means “Sample Aspect Ratio”: in other words, the pixel’s aspect ratio. In television, this should really be called “Display Aspect Radio”, because it describes the playback display aspect ratio, not the pixel. But it is so named in x262.
Here, the output filename is given. We give it a VOB extension, because it is a Video OBject, as required by the DVD specification. Of course, at this stage, it contains video only until we add an MPEG stream containing sound in the next step, which is another blog post.
Do not forget this dash! It instructs x262 to receive its input from a pipe and not a file.

Here is the complete command line just described. The output is an MPEG2 encoded video file, noticeably better in quality than a file of the same bandwidth produced by FFmpeg’s native MPEG2 encoder.

ffmpeg -r 25 -i 24FPS-FILM-FILE -vf scale=720:576:lanczos,smartblur=1.0:-0.4,colormatrix=bt709:bt601,setdar=16/9,setsar=64/45 -an -pix_fmt yuv420p -f yuv4mpegpipe - | x262 --fps 25 --demuxer y4m --mpeg2 --preset placebo --open-gop --tune film --keyint 12 --fake-interlaced --vbv-maxrate 8800 --vbv-bufsize 1835 --range tv --colorprim bt470bg --transfer bt470bg --colormatrix bt470bg --sar 16:9 --crf 1 -o FILENAME.vob -

Make the Edirol or Roland UM-1 and UM-1X work on Windows 10

Roland, who make the venerable MIDI to USB interface the UM-1, and its more recent version the UM-1X, claim that they will not support Windows 10. And, indeed, when you install Windows 10 onto a machine with the UM-1X plugged in, it remains unrecognised by the new operating system.

You can fix this with a text editor. Remember also you must set your Windows installation NOT to enforce driver signing. Please see the comments (below) for how to do this.

  1. Download from Roland the driver archive for Windows 8.1. Its filename is
  2. Unpack the archive.
  3. If you have a 64-bit machine, browse within the archive to this folder:

  4. Open the file RDIF1009.INF in your favourite text editor.
  5. Edit line 33, changing:
  6. Edit line 42, changing:
    [Roland.NTamd64.6.2] …to

  7. Save this file and exit the editor.
    One successful user reported that he plugged-in the UM-1 at this point.
  8. Browse to your Device Manager by holding down the ‘Windows’ key, pressing ‘R’, then selecting “Device Manager” from the menu that appears.
  9. Double-click on your non-functioning UM-1, which will be labelled “Unknown device”.
  10. Select the second tab: “Driver”
  11. Click the “Update driver…” button
  12. Click “Browse my computer for driver software”
  13. Browse to the folder containing the file you have just edited
  14. Click ‘OK’ to select the directory, then click ‘Next’
  15. When Windows complains “Windows can’t verify the publisher of this driver software”, click “Install this driver software anyway”
  16. Wait until you see “Windows has successfully updated your driver software.

Done! You’ve installed the Windows 8.1 driver for the Roland/Edirol UM-1X on Windows 10, even though Roland state they don’t support this device. I can confirm that the MIDI input works just fine.

Converting Avid Meridien MJPEG media with FFmpeg

One of the older native codecs in Avid, the Meridien motion-JPEG (MJPEG) codec, can be read by the FFmpeg utility and, therefore, converted into other more popular codecs and wrappers. However, there is a caveat concerning the luminance levels.

Meridien MJPEG files, whether wrapped as MXF or OMF, are signalled as full-range (0 – 255) files, but in fact they contain limited range (16 – 235) data, leaving room for super-white and super-black excursions. Their pixel format is read as yuvj422p but, in conversion to the usual yuv420p used for most consumer-oriented formats, the range is reduced yet again. So, this must be prevented.

Remember also that Avid’s Meridien files contain 16 blank lines at the top for VBI information. These need to be cropped.

There is a simple solution, using FFmpeg’s scaler. For SD 576-line files (“PAL” standard), include, early in the video filter chain, this:


For 480-line (“NTSC” standard) files, use this:


A full command line to convert a UK television picture might be:

$ ffmpeg -i VIDEO.mxf -i AUDIO.wav -vcodec libx264 -acodec libfdk_aac -vf scale=src_range=0:dst_range=1,crop=x=0:y=16:w=720:h=576,setdar=16/9 -aspect 16:9 VIDEO-AND-AUDIO-H264.mkv

Play Avid Meridien MXF or OMF with FFplay

You may have found an old Avid drive containing MXF or OMF files compressed with Meridien codecs. Sometimes these are known by their compression ratio, e.g. “2:1” or “14:1”.

Because of the combination of the MXF/OMF container and the Meridien codec, rarely found in modern software apart from Avid, these files can be difficult to play, even if your QuickTime installation contains the Avid-distributed codecs.

So how can you view these files for free?

Easy. Avid Meridien compression is actually MJPEG – Motion JPEG compression. The free and open-source utility FFmpeg has a sister player: FFplay. Even though it doesn’t know how to find an MJPEG codec inside an MXF OP1A wrapper, or an Avid OMF wrapper, you can tell it what to do with a simple command line. Then, you can view any Meridien-compressed MXF or OMF files on your drive.

As a guide, MXF video files are named in the following way:


ID” is a hexadecimal string that Avid uses to track the media. The pattern for OMF files is similar.

When the letter ‘V’ follows a clip name, and is succeeded by a pair of digits, you’ve found a video file. Then, the command to play it is:

ffplay -f mjpeg CLIPV01.<ID>_<ID>.MXF

The trick is the “-f mjpeg” in the command line. This forces FFplay to interpret the file as containing data encoded as Motion JPEG.

And now you can see your pictures. They’ll play with the VBI data included, and the colour range may appear washed out because you’re displaying broadcast-level pictures on a computer-level display.

Diplacusis — or why do some people hate violins?

tl;dr — I had very disturbing diplacusis (double hearing) during a really bad bout of influenza, but recovered after a month.

The Diplacusis Diary

Being a Tonmeister, and loving music all my life, I didn’t understand what drove some people, even those in my family, to dislike violins. Where I enjoyed beautiful, warm, expressive singing tone, they heard “tuneless cats wailing” or worse.

Whereas the main complainant among my relatives didn’t seem to mind piano music too much, orchestras and violins in particular were, to her, the equivalent of a knife edge being dragged squealing across a china plate.

How could there be such a difference?

Until last month, I had no idea. But now I know.

For three weeks, my right ear has presented me with hideously detuned ghost orchestras, squawking organ pipes, shrieking violins and cracked bells. Music encoded using codecs such as MP3 or AAC sounded like it was being played through loudspeakers whose cones had been torn apart, and any perception of stereo was lost: everything was shifted about 40° to the left, while demonic pitchless musicians wailed over my right shoulder. In short, all pleasure in music was replaced by agony, and my work as a performing musician, occasional record producer and film editor appeared finished.

This is an essay on the ailment diplacusis, and my journey to safety through it. To be more accurate, my particular case was diplacusis dysharmonica, where pitch is perceived normally in one ear, but wrongly in the other. This article is no substitute for a professional diagnosis and a course of therapy from a medical specialist, but it is published to show how a musician and amateur physicist (me) worked through the nightmare, and was healed by the brain and body’s own resources.

Yes, I’m better now and, indeed, most people recover without intervention. But, if you have begun a similar journey, please get checked by the best professional you can find because many different causes lead to the same ailment. Most triggers that the body can’t fix on its own can be cured by pharmaceutical or surgical intervention. Please don’t hesitate.

Where did it start?

I have normal hearing for a 51-year-old, gracefully growing older. There’s a little high-frequency tinnitus but nothing to worry about. Then, in May 2015 began my worst bout of influenza ever. This brought about the kind of coughing and congestion that kills older people.

While blowing my nose rather fiercely, I felt and heard something nasty, probably mucus, shoot up my right Eustachian tube and into my middle ear. Or perhaps too much pressure was used and something inside my middle ear became damaged?

Immediately, I felt a sense of pressure as if my ear needed to ‘pop’ and, as usual, there was a dullness of hearing. This is perfectly normal when the pressures either side of the tympanum are unequal. But also, there was a new acoustic effect, as if my eardrum were in direct physical contact with my throat. Breathing and swallowing became much louder than usual in this ear alone. And popping my ears to relieve pressure changed none of this.

So, in the matter of a very short space of time, I had an ear that felt completely full of something, and that would not respond to the normal procedures. The next day, I was checked by a doctor who wanted me to visit the audiology department at the hospital if things weren’t getting better. The tympanum is translucent, and an expert can diagnose much by shining bright light onto it.

What did I notice?

Day three dawned. Outside my house, off to the right from where I sit for my everyday work, there is a church. The bell, which was being tolled to call the congregation for the morning service, had developed a problem. It sounded as if it been cracked, which was a pity because its sound was normally very pleasant, a reminder that this is a historic and pretty town. Later that day, there was space in the diary to visit the vicar to tell him about the sad accident that had happened in his bell tower in case he’d not noticed.

Then it was time to edit and master some music for a client. Despite the feeling of pressure in the right ear, sensitivity had returned so I fearlessly began work.

The first piece of music wasn’t from the usual excellent producer whose work normally went into this particular project and the difference certainly showed! The whole choir was way off to the left in the stereo soundstage, and the MP4 audio file sounded terribly distorted, as if encoded at a very low bitrate. The right hand channel, particularly, had incredible harmonic distortion and countless intermodulation products. I very nearly fired off a cheery email to my friend who usually provides this material, saying “it’s easy to tell this isn’t from you!”

Then I glanced at the meters and the waveform. The audio was in dual-channel mono. In other words, both audio streams were identical and panned dead centre. What on EARTH was I hearing? Were my speakers or amplifier blown?

Into a separately amplified output, my headphones were plugged. The sound was just as awful. But then the real horror began: turning the cans the other way around, the balance and wild distortion inside my head were identical, as if I’d not reversed the headphones at all.

So I checked just the left channel: and it was perfect. But with the right channel alone, not only was the sound like someone singing through a comb and paper, it was nearly a semitone sharp! The vocal timbre also sounded sped up, like a tape being played through a pitch shifter.

A first response

This was deeply unpleasant. “I’m broken!” was the first thought. After a lifetime of playing and loving music, and wondering why my mother didn’t like musical sounds at all, suddenly all my own pleasure in music was lost. The glory of stereo, “sound sculpted in space”, had gone. I could no longer tell if an instrument or singer was in tune. And judgement on matters of tonal balance was impossible.

Every day in the press, we read about people whose lives have been utterly ruined by accidents. Losing part of one ear is hardly equivalent to being crippled and confined to a wheelchair for ever. And if a person suddenly disabled can find a way through, it wouldn’t be too much trouble for me with one-and-a-half ears and all my limbs still working.

A bit sad for a musician and producer, though — the end of my lifetime’s ambition.

That afternoon, I played piano for a rehearsal. The whole echo of the church appeared routed through a pitch-shifter and screamed mockingly at me like a choir in the worst kind of horror movie.


So, that evening, there was time to analyse what was happening.

Speech? All sibilants on the left, and sounding sped-up in the right ear alone.

Sine waves? Fine up to about 2kHz, then bad intermodulation distortion when feed to both ears: and pitch shift above 2kHz in the right ear alone.

Playing the piano? Everything an octave above Middle C and higher was surrounded by a vile cluster of discordant tones.

What about fun with heavily-planned Beatles’ songs, where the vocals or an instrument are fully on one stereo channel or the other? The trumpet solo in “Penny Lane” was unlistenable in part, though the brain did a good job of pulling some of it back into pitch on its lower notes. Over this, I had no conscious control: it was rather like watching a remotely controlled machine at work.

The Nat ‘King’ Cole album “Welcome To The Club” has the vocals bizarrely panned entirely on one channel. You can see where I’m going with this! And, yes, he was singing a semitone sharp. So was my enjoyment of music and my professional judgement over for life?

Over the week that followed, experiments continued. Every morning I’d be woken by the church clock chiming with all its harmonics in the wrong pitch (though the fundamental tone was fine), then I’d try the piano: there were clusters of evil upper partials on every note, and harmonies brought no pleasure or contrast. And recorded music encoded with perceptual codecs still sounded as if played through a class B amplifier with terrible crossover distortion.

Thinking in Physics

What might have been happening inside my ear? The feeling of pressure was still there, and everything above about 1.5kHz was pitch-shifted up.

If the workings of the ear are unknown to you, I suggest that, at this point, you take a look at some Wikipedia entries particularly regarding the tympanum, the ossicles, the cochlea and the organ of Corti. Remember how standing waves are set up along the basilar membrane, turning it into a spectrum analyser.

If you have access to a tone generator, try this: feed 2kHz or 3kHz into headphones, then clench your jaw strongly. Did you hear the pitch of the tone go up? Is the pressure on your ear affecting the bone holding your cochlea and therefore changing its shape, altering the places along the basilar membrane where different frequencies resonate, thereby fooling the brain into perceiving a different pitch?

Maybe something, maybe mucus, was putting pressure constantly on my cochlea, possibly on its oval window, permanently changing the places where resonance occurs when frequencies are higher than about 1.5kHz? This is in line with the place theory of pitch perception.

And perhaps the audio that is normally heavily modified by the MP3 or AAC algorithms, disguised by the normal ear’s processes, is revealed in all its distortion by my suddenly revelatory but damaged cochlea? In other words, the spectral lines that these codecs decide to distort, lost in the ear’s usual perception, are shown in all their awfulness now that they are shifted for the benefit of my aural education.

How to fix my ear?

So at this point, about two weeks before writing this essay, I resolved to get through this in several ways.

  1. Using commonly available open source software, I could have found where the frequency break in my damaged ear was, and design a process that maps frequencies above this frequency to slightly lower frequencies, thus restoring normal pitch perception for headphone use. Perhaps even a digital hearing-aid like this is possible?
  2. Middle ear infections cause pressure in the middle ear, so I was ready to do all that is possible to detect and clear any infection.
  3. I still had influenza and was very congested: so it would have been useful to keep using Olbas Oil and pseudoephedrine to clear any other sinus and Eustachian tube blockages.
  4. Retrain my brain regarding pitch. After all, as a baby, only after birth could the already-formed brain have been able to compare pitch sensations generated by the two ears and, somehow, co-relate them — so why not try to restart the process?

The strong upper harmonics in violins and pipe organs howled violently in my right ear: and, if my family member who hated such instruments also had unresolved diplacusis, perhaps this was the reason for her dislike of such sounds?


Now, the good news, for me at least. My ear has become decongested in the last week, and the shrill demonic orchestra and choir has faded to almost nothing. My stereo hearing is now back to its normal clean status, and music is a constant pleasure. I didn’t need to make my own hearing-aid, the decongestants seemed to work, and my self-training with tones and careful music listening perhaps helped too.

Sometimes, diplacusis can be healed in this way by the body and brain’s own natural functions. This has taken about a month for me.

If you have just experienced the very disturbing onset of diplacusis, maybe this essay has given you hope? But please get to a hearing specialist as soon as you can, in case your situation is different from mine, and you need surgical intervention.

And never blow your nose too hard.