Advanced Nerdy Nights #1: CHR Bank switching
Advanced Nerdy Nights #2: MMC1 CHR and PRG Bank switching, WRAM + Battery
Advanced Nerdy Nights #3: Horizontal background scrolling
Advanced Nerdy Nights #4: Sprite 0 hit for a status bar
Advanced Nerdy Nights #1: CHR Bank switching
To do the advanced lessons you should have already finished Pong.
CHR Bank Switching
Bank switching is exchanging one chunk of ROM for a different chunk, while keeping everything in same address range. It is not making a copy, so it happens instantly. You can switch between different banks whenever you want. The size and memory range of the banks depends on the mapper. For the CNROM mapper used in this article the bank size is 8KB of CHR ROM. The whole 8KB range of PPU memory $0000-1FFF is switched at once. This means the graphics for all background tiles and sprite tiles will be swapped. In your game you may have some tiles duplicated in multiple banks so they do not appear to change on screen. PRG is not bank switched, so it remains at the NROM limit of 32KB.
Set Mapper Number
The first part of adding bank switching is changing the mapper number your .NES file uses. At the top of your code has previously been:
.inesmap 0 ; mapper 0 = NROM, no bank swapping
The new line is:
.inesmap 3 ; mapper 3 = CNROM, 8KB CHR ROM bank swapping
This line in the header just tells the emulator to use CNROM to play your game. A list of other iNES mapper numbers can be seen at the wiki at
Set CHR Size
The next part is to increase the size of your CHR ROM. Change the .ineschr value from 1 to 2, showing that there are now two 8KB banks. CNROM can handle 32KB of CHR ROM or four 8KB banks but this example will only use two.
Add CHR Data
The third part adds the data for the next bank into your game. Just make a new .bank statement below your current one for CHR, giving it the next sequential number. In your code when you set which bank to switch to this is the number used. PRG bank numbers are ignored so your original CHR bank will be #0 and the new one will be #1.
Bank Switching Code
The final part it to write your bank switching code. This subroutine will take a bank number in the A register and switch the CHR bank to it immediately. The actual switch is done by writing the desired bank number anywhere in the $8000-FFFF memory range. The cart hardware sees this write and changes the CHR bank.
... your game code ...
LDA #$01 ;;put new bank to use into the A register
JSR Bankswitch ;;jump to bank switching code
... your game code ...
STA $8000 ;;new bank to use
Bus Conflicts
When you start running your code on real hardware there is one catch to worry about. For basic mappers, the PRG ROM does not care if it receives a read or a write command. It will respond to both like a read by putting the data on the data bus. This is a problem for bank switching, where the CPU is also trying to put data on the data bus at the same time. They electrically fit in a "bus conflict". The CPU could win, giving you the right value. Or the ROM could win, giving you the wrong value. This is solved by having the ROM and CPU put the same value on the data bus, so there is no conflict. First a table of bank numbers is made, and the value from that table is written to do the bank switch.
... code ...
LDA #$01 ;;put new bank to use into A
JSR Bankswitch ;;jump to bank switching code
... code ...
TAX ;;copy A into X
STA Bankvalues, X ;;new bank to use
.db $00, $01, $02, $03 ;;bank numbers
The X register is used as an index into the Bankvalues table, so the value written by the CPU will match the value coming from the ROM.
Putting It All Together
Download and unzip the sample files. This set is based on the previous Week 5 code. Make sure that file, mario0.chr, mario1.chr, and chrbanks.bat is in the same folder as NESASM3, then double click on chrbanks.bat. That will run NESASM3 and should produce chrbanks.nes. Run that NES file in FCEUXD SP to see small Mario.
Inside the LatchController subroutine a new section is added to read the Select and Start buttons from the controller. The Select button switches to CHR bank 0, and the Start button switches to CHR bank 1. Graphics of CHR bank 1 have been rearranged so Mario will change to a beetle. The tile numbers are not changed, but the graphics for those tiles are.
Open the PPU Viewer from the Tools menu in FCEUXD SP and try hitting the buttons. You can see all the graphics changing at once when the active bank switches.
Advanced Nerdy Nights #2: MMC1 CHR and PRG Bank switching, WRAM + Battery
This Week: The MMC1 is the first advanced mapper made by Nintendo. It is used for many games including top titles like The Legend of Zelda. The main benefits are mirroring control, up to 256KB of PRG ROM, 128KB of CHR RAM or ROM, and 8KB of WRAM. The WRAM can be battery backed for saved games. This tutorial will cover all features of the MMC1 and how to use them. You should be comfortable with the normal Nerdy Nights series before starting. Another more simple lesson for bankswitching is Advanced Nerdy Nights #1. If you only need one or two of the banking features then you may want to consider more simple and cheaper mappers instead such as UNROM or CNROM.
Carts using the MMC1 will have the S*ROM board code, like SNROM and SGROM. BootGod's NesCartDB database can be searched for which games use which boards. The ReproPak MMC1 board can also be used to build carts.
Shift Registers
The MMC1 uses a 5 bit shift register to temporarily store the banking bits. Shift registers were covered in Week 7. When writing to the register data comes in from data bit 0 only. This is similar to the controller reading where data outputs to data bit 0. Every time a write happens the current bits are shifted and D0 is inserted. The first bit you write eventually becomes to lowest bank bit. On the 5th write when the shift register is full the 5 bit value gets copied to the banking register. At this point the bank switch happens immediately without any delays. To load a bank register the LSR instruction is used for shifting:
Unlike other simple mappers like UNROM and CNROM, there are no bus conflicts. The ROM is not enabled while you are writing so you do not have to make the data you are writing match.
Data bit 7 is also connected to the MMC1. When a write happens to any banking register with D7=1 the shift register is reset back to position 0. It will then take another 5 writes to fully load the next value. All other bits are ignored and D0 is not loaded into the shift register. The PRG bits of the control register are also reset to their default values as shown in the next section. Usually you will only reset the MMC1 at the very beginning of your program:
Config Register at $8000-9FFF
To load the config register, do 5 writes to the $8000-9FFF range. The config bits are:
Mirroring Config
Your program can change the mirroring at any point using these bits. You do not need to wait for vblank to change them. When using the MMC1 the .inesmir directive bit is ignored. You must set it through your code. Mirroring set to 0 and 1 are single screen mirroring modes. Only 1KB is used for all nametables. When scrolling the screen will wrap both vertically and horizontally. Mirroring set to 2 is the typical vertical mirroring, and 3 is horizontal mirroring.
PRG Bank Size Config
The MMC1 swaps PRG ROM in either 16KB or 32KB chunks. By default this bit is set to 1 for 16KB banks. Clearing it to 0 enables 32KB banks. Notice these are not the same size as the 8KB NESASM banks so the bank numbers will be different. When using 16KB banks the MMC1 banks are twice as big, so you must divide your NESASM bank number by 2 when writing it to the bank register. When using 32KB banks you must divide by 4. 16KB banks is most commonly used, with the bulk of the code in the fixed bank and data/graphics/music in the swappable banks.
PRG Swap Range Config
When using 16KB banks set above, the PRG address range that gets swapped can be configured. If 32KB banks are used this bit is ignored and the entire $8000-FFFF range is swapped at once.
By default this bit is set to 1, making the $8000-BFFF range swappable while the $C000-FFFF range is fixed to the last bank Of PRG. This matches the PRG swapping of the UNROM mapper and is most commonly used. Clearing this bit to 0 changes this so $8000-BFFF is fixed and $C000-FFFF is swappable.
Changing the range or bank size can be useful for swapping audio samples but you have to be careful to put IRQ/reset/NMI vectors in all banks that are loaded into the vector area at $FFFA-FFFF.
CHR Bank Size Config
Like the PRG the CHR bank size can be configured to either 4KB or 8KB banks. With 8KB banks the whole $0000-1FFF range is one bank. With 4KB banks there are two banks at PPU $0000-0FFF and $1000-1FFF. This can be used with background in one bank and sprites in another. Then, for example, all sprites could be swapped and the background could stay.
CHR Bank 0 Register at $A000-BFFF
This is the register for CHR bank 0. To set it do 5 writes to the $A000-BFFF range. When in 4KB CHR mode it selects a bank for PPU $0000-0FFF. The full 5 bit value is used so there are 32 possible banks. Each bank is 4KB making it 128KB CHR maximum. When in 8KB CHR mode this register controls the full PPU $0000-1FFF. The bottom bit is ignored so there are 16 possible banks. Each bank is now 8KB which is still 128KB max.
PRG Bank Register at $E000-FFFF
This is the register for PRG banking. To set it do 5 writes to the $E000-FFFF range. The bits are:
In 16KB PRG mode it selects a 16KB PRG bank for the current swappable address range. Only the 4 lower bits are used for 16 possible PRG banks. That is 256KB maximum. In 32KB PRG mode it selects a 32KB bank for the $8000-FFFF range. Only bits 3-1 are used for 8 possible banks. Bit 0 is ignored.
Advanced Nerdy Nights #3: Horizontal background scrolling
This Week: Time to learn how to do horizontal background scrolling, like Super Mario Bros. Hopefully it is explained with the most easy to understand code. There is no compression, no buffers, and no metatiles, so only the ideas of scrolling are presented. Once you understand the scrolling part you should look into those other topics to save code/data space and increase performance if needed.
Nametable Review
Before starting the scrolling you must fully understand how nametables work. One nametable is 32x30 background tiles, which covers exactly one visible screen. Including the attribute table, each screen needs 1KB of PPU RAM. The NES PPU has the address space for 4 nametables ($2000, $2400, $2800, $2C00) in a 2x2 grid:
Vertical mirroring means the nametables stacked vertically are the same data. 0 ($2000) is a mirror of 2 ($2800), and 1 ($2400) is a mirror of 3 ($2C00). 0 and 1 are next to each other and have different data. This is what we want for horizontal scrolling. When you are looking at nametable 0 and scroll to the right, nametable 1 will be in view. Typically your mirroring setting is the opposite of the scrolling direction. To set the iNES header:
Scroll registers
Before scrolling we will fill both nametables 0 ($2000) and 1 ($2400). The same data will be copied into both, except the attribute table will be different. By setting the second nametable attributes to another color palette the two screens will have a very visible difference.
This sample code just increments the horizontal scroll register ($2005) by 1 on every frame. You can see when the first nametable scrolls off the screen, the second one comes on screen. The previously set colors make the split between nametables obvious. As the scroll register wraps from 255 to 0 the first nametable becomes completely visible again. You can also see the sprites are not affected by the scroll registers. They have their own separate x and y position data.
The full code and compiled .NES file is available from the download link at the bottom of this tutorial. scrolling1.asm includes everything up to this point.
Nametable Register
The problem with just the scroll register is that it isn't big enough. In the previous example the scroll wrapped from 255 to 0, so the second nametable is never shown on the left side. Both nametables together is 512 pixels wide but the scroll can only count 256 pixels. The solution is to switch which nametable is on the left side of the screen at the same time the scroll register wraps to 0.
Vertical mirroring means nametables are arranged horizontally
Scrolling shows nametable 0 and 1 (blue) on the screen (red)
When the scroll register wraps, nametable 0 is displayed again
Swap which nametable is on the left when the wrap happens to display nametable 1
To set the starting nametable, change bit 0 of the PPU control register at $2000. Clearing it to 0 will put nametables 0 and 2 on the left side of the screen with 1 and 3 to the right. Setting it to 1 will put 1 and 3 on the left, and 0 and 2 on the right.
This sample code has the same scroll incrementing, but swaps the nametables at the same time the scroll wraps from 255 to 0. Instead of the background jumping it continuously scrolls from one nametable to the next. When the scroll wraps again the nametables are swapped again and the scrolling keeps going.
The full code and compiled .NES file is available from the download link at the bottom of this tutorial. scrolling2.asm includes everything up to this point.
Drawing New Columns
For just two screens of graphics the code above is fine. Games like Super Dodgeball use this method. Both nametables are filled and scrolled between. For games like SMB where the levels are wider than two screens some new background data will have to be inserted. The solution is to draw a new vertical column of tiles somewhere off the visible screen, before it is scrolled into the visible area. As long as the new column is drawn ahead of the visible area, calculated by the current scroll and nametable, it will appear continuous. The tricky part is figuring out which column to draw, and where it is to be placed. If we always use the opposite nametable and the same scroll point we will be drawing the column that is about to come on screen.
>When to Draw
We will draw a new column anytime the scroll register becomes a multiple of 8, meaning the scroll is aligned to the tiles. Some bit masking and testing can calculate when this happens. First any part of the scroll not 0 to 7 is thrown away. Then if the result equals 0 the scroll count is a multiple of 8.
Where to Draw
Now that we know when to draw, we need to calculate the starting PPU address of the new column. The scroll register counts in pixels, but we want to count in tiles for which column to draw. Each tile is 8 pixels wide, so we divide the scroll by 8 to get the tile number. That number is the low bits of the address.
How to Draw
Previously when we have been copying data to the background the PPU is set to auto increment the address by 1. That helps with the copying because a whole row of data can be copied while only writing the PPU address once. Incrementing by 1 goes to the next horizontal tile. In this case we want to go to the next vertical tile because we are copying a column instead of a row. We want it to increment by 32 which will jump down instead of across. There are 32 tiles per row, so adding 32 will always go down to the next row in the same column. The PPU has an increment 32 mode, set using bit 2 in the PPU control register at $2000. When bit 2 is set to 0 the increment mode is +1. When bit 2 is set to 1 the increment mode is +32. By setting the increment mode to +32 and copying 30 bytes of background tiles we can draw one column at a time.
By using the when/where/how we can draw a new column of data off screen before it becomes visible. The full code and compiled .NES file is available from the download link at the bottom of this tutorial. scrolling3.asm includes everything up to this point. It will be best to watch in an emulator where you can see everything that is off screen. First open the scrolling3.nes file in the FCEUXDSP emulator. Then choose "Name Table Viewer..." from the "Tools" menu. Reset the emulator and watch the new columns being drawn off the visible screen area.
Drawing Real Background Data
The last example drew new columns, but it wasn't any real data. This example adds another counter to keep track of how far along into the level a player is. By incrementing this counter every time a new column is drawn the correct next column is easy to find. The DrawNewColumn function has been updated to use the counter to load real background data. It can also be used at the beginning of the game initialization to populate the starting nametable data instead of using the fill loops.
The full code and compiled .NES file is available from the download link at the bottom of this tutorial. scrolling4.asm includes 4 screens (128 columns) of real background ripped from SMB.
Updating the Attributes
The final piece of the scrolling puzzle is the attribute table. Updating it is the same process as the background, where the attributes are updated while they are off screen. Again the scroll and nametable registers will be used to calculate the correct attribute bytes to update. Each attribute byte covers a 4x4 tile area. 4 tiles wide is 32 pixels, so the attributes must be updated anytime the scroll register is a multiple of 32. The column numbers already calculated could be used instead of the scroll variables to do the calculations.
Once you have understood everything here, there are some more advanced concepts to check out:
Meta Tiles - This idea is to store your backgrounds as bigger blocks instead of individual tiles. Things like the question blocks would be stored as one byte in the ROM and then decoded into the 4 tiles when it is being drawn. Mostly this saves huge amounts of data space and could make updating attributes easier.
Buffers - A section of RAM can be reserved to act as a buffer for the data to draw to the PPU later. Outside of vblank where the is more processing time the next graphics updates would be calculated and stored in a buffer. Then during vblank those buffers can be dumped right to the PPU, saving time.
Compression - Packing the background data into simple compression formats like RLE can save even more data space. Combine that with meta tiles and buffers to have a full scrolling engine.
Putting It All Together
Download and unzip the sample files. Each of them adds a small step, so go through them one at a time. Try expanding the background data to add more columns, making the scroll speed variable, or making the scrolling controlable.
Audio Tutorial Series
Nerdy Nights Sound intro: About the Nerdy Nights Sound series
Nerdy Nights Sound: Part 1: make a music/sfx engine
Nerdy Nights Sound: Part 2: Square 2 and Triangle Basics
Nerdy Nights Sound: Part 3: Periods and lookup tables
Nerdy Nights Sound: Part 4: sound engine skeleton
Nerdy Nights Sound: Part 5: Sound Data, Pointer Tables, Headers
Nerdy Nights Sound: Part 6: Tempo, Note Lengths, Buffering and Rests
Nerdy Nights Sound: Part 7: Volume Envelopes
Nerdy Nights Sound: Part 8: Opcodes and Looping
Nerdy Nights Sound: Part 9: Finite Loops, Key Changes, Chord Progressions
Nerdy Nights Sound: Part 10: Simple Drums
Nerdy Nights Sound: Part 1: make a music/sfx engine
Music and sound effects on the NES are generated by the APU (Audio Processing Unit), the sound chip inside the CPU. The CPU "talks" to the APU through a series of I/O ports, much like it does with the PPU and joypads.
PPU: $2000-$2007
Joypads: $4016-$4017
APU: $4000-$4015, $4017
The APU has 5 channels: Square 1, Square 2, Triangle, Noise and DMC. The first four play waves and are used in just about every game. The DMC channel plays samples (pre-recorded sounds) and is used less often.
The square channels produce square waveforms. A square wave is named for its shape. It looks like this:
As you can see the wave transitions instantaneously from its high point to its low point (where the lines are vertical). This gives it a hollow sound like a woodwind or an electric guitar.
The triangle channel produces triangle waveforms. A triangle wave is also named for its shape. It looks like this:
The sound of a triangle wave is smoother and less harsh than a square wave. On the NES, the triangle channel is often used for bass lines (in low octaves) or a flute (in high octaves). It can also be used for drums.
The noise channel has a random generator, which makes the waves it produces sound like.. noise. This channel is generally used for percussion and explosion sounds.
The DMC channel plays samples, which are pre-recorded sounds. It is often used to play voice recordings ("Blades of Steel") and percussion samples. Samples take up a lot of ROM space, so not many games make use of the DMC channel.
Enabling Channels
Before you can use the channels to produce sounds, you need to enable them. Channels are toggled on and off via port $4015:
APUFLAGS ($4015)
||||+- Square 1 (0: disable; 1: enable)
|||+-- Square 2
||+--- Triangle
|+---- Noise
+----- DMC
Here are some code examples using $4015 to enable and disable channels:
lda #%00000001
sta $4015 ;enable Square 1 channel, disable others
lda #%00010110
sta $4015 ;enable Square 2, Triangle and DMC channels. Disable Square 1 and Noise.
lda #$00
sta $4015 ;disable all channels
lda #$0F
sta $4015 ;enable Square 1, Square 2, Triangle and Noise channels. Disable DMC.
;this is the most common usage.
Try opening up some of your favorite games in FCEUXD SP and set a breakpoint on writes to $4015. Take a look at what values are getting written there. If you don't know how to do this, follow these steps:
2. Load a ROM
3. Open up the Debugger by pressing F1 or going to Tools->Debugger
4. In the top right corner of the debugger, under "BreakPoints", click the "Add..." button
5. Type "4015" in the first box after "Address:"
6. Check the checkbox next to "Write"
7. Set "Memory" to "CPU Mem"
8. Leave "Condition" and "Name" blank and click "OK"
Now FCEUX will pause emulation and snap the debugger anytime your game makes a write (usually via STA) to $4015. The debugger will tell you the contents of the registers at that moment, so you can check what value will be written to $4015. Some games will write to $4015 every frame, and some only do so once at startup. Try resetting the game if your debugger isn't snapping.
What values are being written to $4015? Can you tell what channels your game is using?
Square 1 Channel
Let's make a beep. This week we'll learn how to produce a sound on the Square 1 channel. The Square channels are everybody's favorites because you can control the volume and tone and perform sweeps on them. You can produce a lot of interesting effects using the Squares.
Square 1 is controlled via ports $4000-$4003. The first port, $4000, controls the duty cycle (ie, tone) and volume for the channel. It looks like this:
SQ1_ENV ($4000)
||||++++- Volume
|||+----- Saw Envelope Disable (0: use internal counter for volume; 1: use Volume for volume)
||+------ Length Counter Disable (0: use Length Counter; 1: disable Length Counter)
++------- Duty Cycle
For our purposes, we will focus on Volume and Duty Cycle. We will set Saw Envelope Disable and Length Counter Disable to 1 and then forget about them. If we leave Saw Envelopes on, the volume of the channel will be controlled by an internal counter. If we turn them off, WE have control of the volume. If WE have control, we can code our own envelopes (much more versatile). Same thing with the Length Counter. If we disable it, we have more control over note lengths. If that didn't make sense, don't worry. It will become clearer later. For now we're just going to disable and forget about them.
Volume controls the channel's volume. It's 4 bits long so it can have a value from 0-F. A volume of 0 silences the channel. 1 is very quiet and F is loud.
Duty Cycle controls the tone of the Square channel. It's 2 bits long, so there are four possible values:
00 = a weak, grainy tone. Think of the engine sounds in RC Pro-Am. (12.5% Duty)
01 = a solid mid-strength tone. (25% Duty)
10 = a strong, full tone, like a clarinet or a lead guitar (50% Duty)
11 = sounds a lot like 01 (25% Duty negated)
The best way to know the difference in sound is to listen yourself. I recommend downloading FamiTracker and playing with the different Duty settings in the Instrument Editor.
For those interested, Duty Cycle actually refers to the percentage of time that the wave is in "up" position vs. "down" position. Here are some pictures:
25% negated
Don't sweat it if graphs and waves aren't your thing. Use your ears instead.
Here's a code snippet that sets the Duty and Volume for the Square 1 channel:
lda #%10111111; Duty 10 (50%), volume F (max!)
sta $4000
$4001 controls sweeps for Square 1. We'll skip them for now.
Setting the Note
$4002 and $4003 control the period of the wave, or in other words what note you hear (A, C#, G, etc). Periods are 11-bits long. $4002 holds the low 8-bits and $4003 holds the high 3-bits of the period. We'll get into more detail in a future tutorial, but for now just know that changing the values written to these ports will change the note that is played.
SQ1_LO ($4002)
++++++++- Low 8-bits of period
SQ1_HI ($4003)
|||||+++- High 3-bits of period
+++++---- Length Counter
The Length Counter, if enabled, controls how long the note is played. We disabled it up in the $4000 section, so we can forget about it for now.
Here is some code that will produce an eternal beep on the Square 1 channel:
lda #%00000001
sta $4015 ;enable square 1
lda #%10111111 ;Duty 10, Volume F
sta $4000
lda #$C9 ;0C9 is a C# in NTSC mode
sta $4002
lda #$00
sta $4003
Putting It All Together
Download and unzip the sample files. All the code above is in the square1.asm file. Make sure square1.asm and square1.bat are all in the same folder as NESASM3, then double click square1.bat. That will run NESASM3 and should produce the square1.nes file. Run that NES file in FCEUXD SP to listen to your beep! Edit square1.asm to change the Volume (0 to F), or to change the Duty Cycle for the square wave. Try changing the period to produce different notes.
Next Week: Square 2 and Triangle. Multiple beeps!
Music and sound effects on the NES are generated by the APU (Audio Processing Unit), the sound chip inside the CPU. The CPU "talks" to the APU through a series of I/O ports, much like it does with the PPU and joypads.
PPU: $2000-$2007
Joypads: $4016-$4017
APU: $4000-$4015, $4017
The APU has 5 channels: Square 1, Square 2, Triangle, Noise and DMC. The first four play waves and are used in just about every game. The DMC channel plays samples (pre-recorded sounds) and is used less often.
The square channels produce square waveforms. A square wave is named for its shape. It looks like this:
As you can see the wave transitions instantaneously from its high point to its low point (where the lines are vertical). This gives it a hollow sound like a woodwind or an electric guitar.
The triangle channel produces triangle waveforms. A triangle wave is also named for its shape. It looks like this:
The sound of a triangle wave is smoother and less harsh than a square wave. On the NES, the triangle channel is often used for bass lines (in low octaves) or a flute (in high octaves). It can also be used for drums.
The noise channel has a random generator, which makes the waves it produces sound like.. noise. This channel is generally used for percussion and explosion sounds.
The DMC channel plays samples, which are pre-recorded sounds. It is often used to play voice recordings ("Blades of Steel") and percussion samples. Samples take up a lot of ROM space, so not many games make use of the DMC channel.
Enabling Channels
Before you can use the channels to produce sounds, you need to enable them. Channels are toggled on and off via port $4015:
APUFLAGS ($4015)
||||+- Square 1 (0: disable; 1: enable)
|||+-- Square 2
||+--- Triangle
|+---- Noise
+----- DMC
Here are some code examples using $4015 to enable and disable channels:
lda #%00000001
sta $4015 ;enable Square 1 channel, disable others
lda #%00010110
sta $4015 ;enable Square 2, Triangle and DMC channels. Disable Square 1 and Noise.
lda #$00
sta $4015 ;disable all channels
lda #$0F
sta $4015 ;enable Square 1, Square 2, Triangle and Noise channels. Disable DMC.
;this is the most common usage.
Try opening up some of your favorite games in FCEUXD SP and set a breakpoint on writes to $4015. Take a look at what values are getting written there. If you don't know how to do this, follow these steps:
2. Load a ROM
3. Open up the Debugger by pressing F1 or going to Tools->Debugger
4. In the top right corner of the debugger, under "BreakPoints", click the "Add..." button
5. Type "4015" in the first box after "Address:"
6. Check the checkbox next to "Write"
7. Set "Memory" to "CPU Mem"
8. Leave "Condition" and "Name" blank and click "OK"
Now FCEUX will pause emulation and snap the debugger anytime your game makes a write (usually via STA) to $4015. The debugger will tell you the contents of the registers at that moment, so you can check what value will be written to $4015. Some games will write to $4015 every frame, and some only do so once at startup. Try resetting the game if your debugger isn't snapping.
What values are being written to $4015? Can you tell what channels your game is using?
Square 1 Channel
Let's make a beep. This week we'll learn how to produce a sound on the Square 1 channel. The Square channels are everybody's favorites because you can control the volume and tone and perform sweeps on them. You can produce a lot of interesting effects using the Squares.
Square 1 is controlled via ports $4000-$4003. The first port, $4000, controls the duty cycle (ie, tone) and volume for the channel. It looks like this:
SQ1_ENV ($4000)
||||++++- Volume
|||+----- Saw Envelope Disable (0: use internal counter for volume; 1: use Volume for volume)
||+------ Length Counter Disable (0: use Length Counter; 1: disable Length Counter)
++------- Duty Cycle
For our purposes, we will focus on Volume and Duty Cycle. We will set Saw Envelope Disable and Length Counter Disable to 1 and then forget about them. If we leave Saw Envelopes on, the volume of the channel will be controlled by an internal counter. If we turn them off, WE have control of the volume. If WE have control, we can code our own envelopes (much more versatile). Same thing with the Length Counter. If we disable it, we have more control over note lengths. If that didn't make sense, don't worry. It will become clearer later. For now we're just going to disable and forget about them.
Volume controls the channel's volume. It's 4 bits long so it can have a value from 0-F. A volume of 0 silences the channel. 1 is very quiet and F is loud.
Duty Cycle controls the tone of the Square channel. It's 2 bits long, so there are four possible values:
00 = a weak, grainy tone. Think of the engine sounds in RC Pro-Am. (12.5% Duty)
01 = a solid mid-strength tone. (25% Duty)
10 = a strong, full tone, like a clarinet or a lead guitar (50% Duty)
11 = sounds a lot like 01 (25% Duty negated)
The best way to know the difference in sound is to listen yourself. I recommend downloading FamiTracker and playing with the different Duty settings in the Instrument Editor.
For those interested, Duty Cycle actually refers to the percentage of time that the wave is in "up" position vs. "down" position. Here are some pictures:
Don't sweat it if graphs and waves aren't your thing. Use your ears instead.
Here's a code snippet that sets the Duty and Volume for the Square 1 channel:
lda #%10111111; Duty 10 (50%), volume F (max!)
sta $4000
$4001 controls sweeps for Square 1. We'll skip them for now.
Setting the Note
$4002 and $4003 control the period of the wave, or in other words what note you hear (A, C#, G, etc). Periods are 11-bits long. $4002 holds the low 8-bits and $4003 holds the high 3-bits of the period. We'll get into more detail in a future tutorial, but for now just know that changing the values written to these ports will change the note that is played.
SQ1_LO ($4002)
++++++++- Low 8-bits of period
SQ1_HI ($4003)
|||||+++- High 3-bits of period
+++++---- Length Counter
The Length Counter, if enabled, controls how long the note is played. We disabled it up in the $4000 section, so we can forget about it for now.
Here is some code that will produce an eternal beep on the Square 1 channel:
lda #%00000001
sta $4015 ;enable square 1
lda #%10111111 ;Duty 10, Volume F
sta $4000
lda #$C9 ;0C9 is a C# in NTSC mode
sta $4002
lda #$00
sta $4003
Putting It All Together
Download and unzip the sample files. All the code above is in the square1.asm file. Make sure square1.asm and square1.bat are all in the same folder as NESASM3, then double click square1.bat. That will run NESASM3 and should produce the square1.nes file. Run that NES file in FCEUXD SP to listen to your beep! Edit square1.asm to change the Volume (0 to F), or to change the Duty Cycle for the square wave. Try changing the period to produce different notes.
Next Week: Square 2 and Triangle. Multiple beeps!
This Week: Sound Data, Pointer Tables, Headers
Designing Sound Data
We have a skeleton sound engine in place. Time to pack it with flesh and organs. Before we can play a song, we will have to load a song. Before we can load a song, we will need song data. So our next step is to decide how our sound data will look. We'll need to design our data format, create some test data and then build our engine to read and play that data.
Data Formats
So how do we go about designing a sound data format? A good place to start would be to look at what we are aiming to play. We know that our sound engine will have two basic types of sound data:
1. Music
2. Sound Effects (SFX)
Music plays in the background. It uses the first 4 channels, has a tempo, and usually loops over and over again.
Sound Effects are triggered by game events (eg, ball hitting a paddle) and don't loop indefinitely.
Sound effects have the job of communicating to the player what is going on right now, so they have priority over music. If there is music playing on the Square 2 channel, and a sound effect is also using the Square 2 channel, the sound effect should play instead of the music.
Depending on the game, some sound effects may have higher priority than others. For example, in a Zelda-like game the sound of the player taking damage would have priority over the sound of the player swinging their sword. The former communicates critical information to the player while the latter is just for effect.
As mentioned above, a sound effect will have to share a channel (or channels) with the music. This is unavoidable because music typically uses all the channels at once, all the time. So when a sound effect starts playing, it has to steal a channel (or more) away from the music. The music will continue to play on the other channels, but the shared channel will go to the sound effect. This creates an interesting problem: if we stop music on a channel to play a sound effect, how do we know where to resume the music on that channel when the sound effect is finished?
The answer is that we don't actually stop the music on the shared channel. We still advance it frame by frame in time with the other music channels. We just don't write its data to the APU ports when a sound effect is playing.
To do this, we will need to keep track of multiple streams of sound data. A data stream is a sequence of bytes stored in ROM that the sound engine will read and translate into APU writes. Each stream corresponds to one channel. Music will have 4 data streams - one for each channel. Sound effects will have 2 streams and the sfx themselves will choose which channel(s) they use. So 6 streams total that could potentially be running at the same time. We will number them like this:
MUSIC_SQ1 = $00 ;these are stream number constants
MUSIC_SQ2 = $01 ;stream number is used to index into stream variables (see below)
SFX_1 = $04
SFX_2 = $05
Each stream will need it's own variables in RAM. An easy way to organize this is to reserve RAM space in blocks and use the stream number as an index:
;reserve 6 bytes each, one for each stream
stream_curr_sound .rs 6 ;what song/sfx # is this stream currently playing?
stream_channel .rs 6 ;what channel is it playing on?
stream_vol_duty .rs 6 ;volume/duty settings for this stream
stream_note_LO .rs 6 ;low 8 bits of period for the current note playing on the stream
stream_note_HI .rs 6 ;high 3 bits of the note period
Here we have 6 bytes reserved for each variable. Each stream gets its own byte, for example:
stream_vol_duty+0: MUSIC_SQ1's volume/duty settings
stream_vol_duty+1: MUSIC_SQ2's volume/duty
stream_vol_duty+2: MUSIC_TRI's on/off
stream_vol_duty+3: MUSIC_NOI's volume
stream_vol_duty+4: SFX_1's volume/duty
stream_vol_duty+5: SFX_2's volume/duty
In our sound_play_frame code we will loop through all of the streams using the stream number as an index:
ldx #$00 ;start at stream 0 (MUSIC_SQ1)
;read from data stream in ROM if necessary
;update stream variables based on what we read
lda stream_vol_duty, x ;the value in x determines which stream we are working with
;do stuff with volume
lda stream_note_LO, x
;do stuff with note periods
;do more stuff with other variables
inx ;next stream
cpx #$06 ;loop through all six streams
bne .loop
The music streams will always be running, updating the APU ports with their data frame by frame. When a sound effect starts playing, one or both of the sfx streams will start running. Because our loop processes the SFX streams last, they will write to the APU last and thus overwrite the shared-channel music streams. Our channel conflict is taken care of automatically by the order of our loop!
We now have an idea of how our stream data will be stored in RAM, but there are still many unanswered questions. How do we load a song? How do we know where to find the data streams in ROM? How do we read from those data streams? How do we interpret what we read from those streams?
To answer these questions, we need to make a data format. Let's start with music. What should our music data look like? Most NES music data is divided into three types:
1. Note - what note to play: A3, G#5, C2, etc
2. Note Length - how long to play the notes: eighth note, quarter note, whole note, etc
3. Opcodes - opcodes tell the engine to perform specific tasks: loop, adjust volume, change Duty Cycle for squares, etc
*3.5. Arguments - some opcodes will take arguments as input (e.g. how many times to loop, where to loop to).
We will need to design our data format to make it easy for the sound engine to differentiate between these three types of data. We do this by specifying ranges. For example, we might say that byte values of $00-$7F represent Notes. $80-$9F are Note Lengths, and $A0-$FF are opcodes. I just made those numbers up. It really doesn't matter what values we use. The important thing is that we have ranges to test against to determine whether a byte is a note, note length or opcode. In our engine code we will have something like this:
lda [sound_pointer], y ;read a byte from the data stream
bpl .note ;if < #$80, it's a Note
cmp #$A0
bcc .note_length ;else if < #$A0, it's a Note Length
.opcode: ;else it's an opcode
;do Opcode stuff
;do Note Length stuff
;do Note stuff
This code reads a byte from the sound data and then tests that byte to see which range it falls into. It jumps to a different section of code for each possible range. Almost any data format you create will be divided into ranges like this, whether it be sound data, map data, text data, whatever.
These two branch instructions are worth learning if you don't know them already. After BEQ, BNE, BCS and BCC they are the most common branch instructions. They are often used in range-testing.
BPL tests the Negative (N) flag and will branch if it is clear. Think of BPL as Branch if PLus. The N flag will be clear if the last instruction executed resulted in a value less than #$80 (ie, bit7 clear).
lda #%01101011
; |
; +-------- bit7 is clear. This will clear the N flag.
bpl .somewhere ;N flag is clear, so this will branch
lda #%10010101
; |
; +-------- bit7 is set. This will set the N flag
bpl .somewhere ;N flag is set, so this will not branch.
BMI is the opposite. It tests the N flag and will branch if it is set. Think of BMI as Branch if MInus. The N flag will be set if the last instruction executed resulted in a value greater than or equal to #$80 (ie, bit7 set).
lda #%01101011
; |
; +-------- bit7 is clear. This will clear the N flag.
bmi .somewhere ;N flag is clear, so this will not branch
lda #%10010101
; |
; +-------- bit7 is set. This will set the N flag
bmi .somewhere ;N flag is set, so this will branch to the label .somewhere.
In the range-testing code above, I used BPL to check if a byte fell into the Note range (00-7F). Go back and check it.
Song Headers
Music on the NES is typically composed of four parts: a Square 1 part, a Square 2 part, a Triangle part and a Noise part. When you want to play a song, you will have the main program issue a command to the sound engine telling it what song you want to play. It will look something like this:
lda #$02
jsr sound_load ;load song 2
Somehow our sound_load subroutine will have to take that "2" and translate it into a whole song, complete with Square 1, Square 2, Triangle and Noise parts. How does that little number become 4 streams of data? Well, that number is an index into a pointer table, a table of pointers to song headers. The song headers themselves will contain pointers to the individual channels' data streams.
Pointer Tables
A pointer table is a special kind of lookup table. Only instead of holding regular old numerical data a pointer table holds addresses. These addresses "point" to the start of data. Addresses on the NES are 16-bit ($0000-$FFFF), so pointer tables are always tables of words. Let's look at an example:
.word $8000, $ABCD, $CC10, $DF1B
Here we have a pointer table. It's four entries long. Each entry is a 16-bit address. Presumably there is data at these four addresses that we will want to read sometime in our program. To read this data we will need to index into the pointer table, grab the address and store it in a zero-page pointer variable and then read using indirect mode:
.rsset $0000
ptr1 .rs 2 ;a 2-byte pointer variable.
;The first byte will hold the LO byte of an address
;The second byte will hold the HI byte of an address
.org $E000 ;somewhere in ROM
lda #$02 ;the third entry in the pointer table ($CC10)
asl a ;multiply by 2 because we are indexing into a table of words
lda pointer_table, y ;#$10 - little endian, so words are stored LO-byte first
sta ptr1
lda pointer_table+1, y ;#$CC
sta ptr1+1
;now our pointer is setup in ptr1. It "points" to address $CC10. Let's read data from there.
ldy #$00
lda [ptr1], y ;indirect mode. reads the byte at $CC10
sta some_variable
lda [ptr1], y ;reads the byte at $CC11
sta some_other_variable
;... etc
This code takes an index and uses it to read an address from our pointer_table. It stores this address in a variable called ptr1 (LO byte first). Then it reads from this address by using indirect mode. We specify indirect mode by putting []'s around our pointer variable. Look at this instruction:
lda [ptr1], y
It means "Find the address ptr1 is pointing to. Add Y to that address. Load the value at that address into A".
This is very versatile because we can stick any address we want into our ptr1 variable and read from anywhere! A pointer table is just a lookup table of places we want to read from.
Of course you usually won't know where exactly in the ROM your data will be. So instead of declaring addresses explicitely ($8000, $ABCD, $CC10, etc), you will use labels instead:
.word data1, data2, data3 ;these entries will evaluate to the 16-bit addresses of the labels below
.byte $FF, $16, $82, $44 ;some random data
.byte $0E, $EE, $EF, $16, $23
.byte $00, $01
Song Header Pointer Table
Our songs work the same way. When the main program tells the sound engine to play a song, it will send the song number with it. This song number is actually an index into a pointer table of song headers:
.word song0_header
.word song1_header
.word song2_header
asl a ;multiply by 2. we are indexing into a table of pointers (words)
lda song_headers, y ;read LO byte of a pointer from the pointer table.
sta sound_ptr ;sound_ptr is a zero page pointer variable
lda song_headers+1, y ;read HI byte
sta sound_ptr+1
ldy #$00
lda [sound_ptr], y
sta some_variable
; the rest of the header data
Header Data
So what will our song header data look like? At the very least it should tell us:
How many data streams we have (songs will usually have 4, but sfx will have fewer)
Which streams those are (which stream index to use)
Which channels those streams use
Where to find those streams (ie, pointers to the beginning of each stream).
Initial values for those streams (for example, initial volume)
As we add more features to our sound engine, we may expand our headers to initialize those features. Let's start simple. Our headers will look like this:
main header:
byte # | what it tells us
00 | number of streams
01+ | stream headers (one for each stream)
stream headers:
byte # | what it tells us
00 | which stream (stream number)
01 | status byte (see below)
02 | which channel
03 | initial volume (and duty for squares)
04-05 | pointer to data stream
The status byte will be a bit-flag that tells us special information about the stream. For now we will just use bit0 to mark a stream as enabled or disabled. In the future we may use other bits to store other information, such as stream priority.
Stream Status Byte
+- Enabled (0: stream disabled; 1: enabled)
Sample Header
Here is some code showing a sample header:
SQUARE_1 = $00 ;these are channel constants
SQUARE_2 = $01
NOISE = $03
MUSIC_SQ1 = $00 ;these are stream # constants
MUSIC_SQ2 = $01 ;stream # is used to index into stream variables
SFX_1 = $04
SFX_2 = $05
.byte $04 ;4 streams
.byte MUSIC_SQ1 ;which stream
.byte $01 ;status byte (stream enabled)
.byte SQUARE_1 ;which channel
.byte $BC ;initial volume (C) and duty (10)
.word song0_square1 ;pointer to stream
.byte MUSIC_SQ2 ;which stream
.byte $01 ;status byte (stream enabled)
.byte SQUARE_2 ;which channel
.byte $38 ;initial volume (8) and duty (00)
.word song0_square2 ;pointer to stream
.byte MUSIC_TRI ;which stream
.byte $01 ;status byte (stream enabled)
.byte TRIANGLE ;which channel
.byte $81 ;initial volume (on)
.word song0_tri ;pointer to stream
.byte MUSIC_NOI ;which stream
.byte $00 ;disabled. We will have our load routine skip the
; rest of the reads if the status byte disables the stream.
; We are disabling Noise because we haven't covered it yet.
;these are the actual data streams that are pointed to in our stream headers.
.byte A3, C4, E4, A4, C5, E5, A5 ;some notes. A minor
.byte A3, A3, A3, E4, A3, A3, E4 ;some notes to play on square 2
.byte A3, A3, A3, A3, A3, A3, A3 ;triangle data
Sound Engine Variables
The last thing we need before we can write our sound_load routine is some variables. As mentioned, our sound engine will have several streams running simultaneously. Four will be used for music (one for each tonal channel). Two will be used for sound effects. So we will declare all variables in blocks of 6. Based on our header data, we will need the following variables:
stream_curr_sound .rs 6 ;reserve 6 bytes, one for each stream
stream_status .rs 6
stream_channel .rs 6
stream_vol_duty .rs 6
stream_ptr_LO .rs 6
stream_ptr_HI .rs 6
Now let's write some code to read our header. Pay special attention to the X register. I recommend tracing through the code using the sample header above. Here is our sound_load routine:
; load_sound will prepare the sound engine to play a song or sfx.
; input:
; A: song/sfx number to play
sta sound_temp1 ;save song number
asl a ;multiply by 2. We are indexing into a table of pointers (words)
lda song_headers, y ;setup the pointer to our song header
sta sound_ptr
lda song_headers+1, y
sta sound_ptr+1
ldy #$00
lda [sound_ptr], y ;read the first byte: # streams
sta sound_temp2 ;store in a temp variable. We will use this as a loop counter
lda [sound_ptr], y ;stream number
tax ;stream number acts as our variable index
lda [sound_ptr], y ;status byte. 1= enable, 0=disable
sta stream_status, x
beq .next_stream ;if status byte is 0, stream disabled, so we are done
lda [sound_ptr], y ;channel number
sta stream_channel, x
lda [sound_ptr], y ;initial duty and volume settings
sta stream_vol_duty, x
lda [sound_ptr], y ;pointer to stream data. Little endian, so low byte first
sta stream_ptr_LO, x
lda [sound_ptr], y
sta stream_ptr_HI, x
lda sound_temp1 ;song number
sta stream_curr_sound, x
dec sound_temp2 ;our loop counter
bne .loop
Now our sound_load routine is ready. If the main program calls it, like this:
lda #$00 ;song 0
jsr sound_load
Our sound_load routine will take the value in the A register and use it to fill our music RAM with everything we need to get our song running!
Reading Streams
Once we have our header loaded, we are ready to rock. All of our active streams have pointers to their data stored in their stream_ptr_LO and stream_ptr_HI variables. That's all we need to start reading data from them.
To read data from our data stream, we will first copy the stream pointer into a zero-page pointer variable. Then we will read a byte using indirect mode and range-test it to determine whether it is a note, note length or opcode. If it's a note, we will read from our note_table and store the 11-bit period in RAM. Finally, we will update our stream pointer to point to the next byte in the stream.
First we will need to declare some new variable blocks for the note periods:
stream_note_LO .rs 6 ;low 8 bits of period
stream_note_HI .rs 6 ;high 3 bits of period
Here is our se_fetch_byte routine (se_ stands for "sound engine"):
; se_fetch_byte reads one byte from a sound data stream and handles it
; input:
; X: stream number
lda stream_ptr_LO, x ;copy stream pointer into a zero page pointer variable
sta sound_ptr
lda stream_ptr_HI, x
sta sound_ptr+1
ldy #$00
lda [sound_ptr], y ;read a byte using indirect mode
bpl .note ;if <#$80, we have a note
cmp #$A0 ;else if <#$A0 we have a note length
bcc .note_length
.opcode: ;else we have an opcode
;nothing here yet
jmp .update_pointer
;nothing here yet
jmp .update_pointer
asl ;multiply by 2 because we are index into a table of words
sty sound_temp1 ;save our Y register because we are about to destroy it
lda note_table, y ;pull low 8-bits of period and store it in RAM
sta stream_note_LO, x
lda note_table+1, y ;pull high 3-bits of period from our note table
sta stream_note_HI, x
ldy sound_temp1 ;restore the Y register
;update our stream pointers to point to the next byte in the data stream
iny ;set index to the next byte in the data stream
adc stream_ptr_LO, x ;add Y to the LO pointer
sta stream_ptr_LO, x
bcc .end
inc stream_ptr_HI, x ;if there was a carry, add 1 to the HI pointer.
Look at the part that updates the stream pointer. After we finish all our reads, Y will hold the index of the last byte read. To be ready for the next frame, we will want to update our pointer to point to the next byte in the data stream. To do this, we increment Y and add it to the pointer. But we have to be careful here. What if our current position is something like this:
stream_ptr: $C3FF
Y: 1
The next position here should be $C400. But ADC only works on the 8-bit level, so if we add 1 to the low byte of the pointer we will get this instead:
stream_ptr: $C300
The FF in the low byte becomes 00, but the high byte remains the same. We need to increment the high byte manually. But how do we know when to increment it and when to leave it alone? Lucky for us, ADC sets the carry flag whenever it makes a FF->00 transition. So we can just check the carry flag after our addition. If it is set, increment the high byte of the pointer. If it is clear, don't increment it. That's what our code above does.
Playing Music
We've loaded our header. We've set up our stream pointers in RAM. We've written a routine that will read bytes from the streams and turn them into notes. Now we need to update sound_play_frame. sound_play_frame will loop through all 6 streams. It will check the status byte to see if they are enabled. If enabled, it will advance the stream by one frame. Here's the code:
lda sound_disable_flag
bne .done ;if sound engine is disabled, don't advance a frame
inc sound_frame_counter
lda sound_frame_counter
cmp #$08 ;***change this compare value to make the notes play faster or slower***
bne .done ;only take action once every 8 frames.
ldx #$00 ;our stream index. start at MUSIC_SQ1 stream
lda stream_status, x ;check bit 0 to see if stream is enabled
and #$01
beq .next_stream ;if disabled, skip to next stream
jsr se_fetch_byte ;read from the stream and update RAM
jsr se_set_apu ;write volume/duty, sweep, and note periods of current stream to the APU ports
cpx #$06 ;loop through all 6 streams.
bne .loop
lda #$00
sta sound_frame_counter ;reset frame counter so we can start counting to 8 again.
And here is se_set_apu which will write a stream's data to the APU ports:
lda stream_channel, x ;which channel does this stream write to?
asl a
asl a ;multiply by 4 so Y will index into the right set of APU ports (see below)
lda stream_vol_duty, x
sta $4000, y
lda stream_note_LO, x
sta $4002, y
lda stream_note_HI, x
sta $4003, y
lda stream_channel, x
bcs .end ;if Triangle or Noise, skip this part
lda #$08 ;else, set negate flag in sweep unit to allow low notes on Squares
sta $4001, y
Writing to the APU ports directly like this is actually bad form. We'll learn why in a later lesson.
One thing to pay attention to is how we get our APU port index. We take the channel and multiply it by 4. Recall that we declared constants for our channels:
SQUARE_1 = $00 ;these are channel constants
SQUARE_2 = $01
NOISE = $03
If our stream_channel is $00 (SQUARE_1), we multiply by 4 to get $00. y = 0
$4000, y = $4000
$4001, y = $4001
$4002, y = $4002
$4003, y = $4003
If our stream_channel is $01 (SQUARE_2), we multiply by 4 to get $04. y = 4
$4000, y = $4004
$4001, y = $4005
$4002, y = $4006
$4003, y = $4007
If our stream_channel is $02 (TRIANGLE), we multiply by 4 to get $08. y = 8
$4000, y = $4008
$4001, y = $4009 (unused)
$4002, y = $400A
$4003, y = $400B
If our stream_channel is $03 (NOISE), we multiply by 4 to get $0C. y = C
$4000, y = $400C
$4001, y = $400D
$4002, y = $400E
$4003, y = $400F
See how everything lines up nicely?
Putting It All Together
Download and unzip the sample files. Make sure the following files are in the same folder as NESASM3:
Double click headers.bat. That will run NESASM3 and should produce the headers.nes file. Run that NES file in FCEUXD SP.
Use the controller to select songs and play them. Controls are as follows:
Up: Play
Down: Stop
Right: Next Song
Left: Previous Song
Song0 is a silence song. It is not selectable. headers.asm "plays" song0 to stop the music when you press down. See song0.i to find out how it works.
Song1 is an evil sounding series of minor thirds.
Song2 is a short sound effect on the Sq2 channel. It uses the SFX_1 stream. Try playing it over the other songs to see how it steals the channel from the music.
Song3 is a simple descending chord progression.
Try creating your own songs and sound effects and add them into the mix. To add a new song you will need to take the following steps:
1) create a song header and song data (use the included songs as reference). Note that data streams are terminated with $FF
2) add your header to the song_headers pointer table at the bottom of sound_engine.asm
3) update the constant NUM_SONGS to reflect the new song number total (also at the bottom of sound_engine.asm)
Although not necessary, I recommend keeping your song data in a separate file like I've done with song0.i, song1.i, song2.i and song3.i. This makes it easier to find the data if you need to edit your song later. If you do this, don't forget to .include your file.
Next Week: Timing, Note Lengths, Buffering and Rests
Nerdy Nights Sound: Part 6: Tempo, Note Lengths, Buffering and Rests
Last Week: Sound Data, Pointer Tables and Headers
This Week: Tempo, Note Lengths, Buffering and Rests
Last week we put together a huge chunk of the sound engine. We finally got it to play something that resembled a song. But we had big limitations when it came to timing and note lengths. We were using a single frame counter to keep time across all 6 streams of the sound engine. This is a problem because it imposes the music's speed on our sound effects. If you were to use such a system in a real game, your sound effects would speed up or slow down whenever you change to a faster or slower song.
We also made the mistake of advancing the sound engine when our frame counter hit its mark, but skipping it when it didn't. What happens if a game event triggers a sound effect one or two frames after the counter hits its mark? The sound effect won't start until the next time our counter reaches its mark - we have to wait for it! There will be a delay. Not good.
Worst of all, our frame counter also doesn't allow for variable note lengths. Unless every song you write is going to consist of only 32nd notes, this is a problem. It becomes apparent then that we need a more complex timing system.
We'll correct the first two problems by ripping out the universal counter and giving each stream it's own private counter. We'll also change our method of counting. Our old method of counting frames and taking action when we reach a certain number is very limited. For example, let's say that we have a song and our frame counter is taking action every 4 frames. Maybe the song sounds a tad faster than we want, so we slow it down by changing the speed to update once every 5 frames. But now the song sounds too slow. The speed we really want is somewhere in between 4 and 5, but we can't get there with our frame counting method. Instead we'll use a ticker.
The ticker method involves taking a number (a tempo) and adding it to a total, frame by frame. Eventually, that total will wraparound from FF->00 and when it does the carry flag will be set (a tick). This carry flag tick will be the signal we look for to advance our stream.
For example, let's say our tempo value is $40 and our total starts at $00. After one frame we will add our tempo to the total. $00 + $40 = $40. Now our total is $40. Another frame goes by (2). We add our tempo to the total again. $40 + $40 = $80. Our total is $80. Another frame goes by (3). $80 + $40 = $C0. Another frame goes by (4). $C0 + $40 = $00. Carry flag is set. TICK! A tick tells us that it is time to advance this stream. When we finish updating, we start adding again until we get another tick.
As you can see, a tempo value of $40 will advance our stream once every 4 frames. If you do some math (256 / 5), you will discover that a tempo of $33 will advance the stream roughly every 5 frames. If $40 is too fast for your song and $33 is too slow, you still have the values $34-$39 to experiment with. Much more versatile! To see why this works, let's see what happens with a tempo value of say $36:
$00 + $36 + $36 + $36 + $36 + $36 = $0E (Tick in 5 frames)
$0E + $36 + $36 + $36 + $36 + $36 = $1C (Tick in 5 frames)
$1C + $36 + $36 + $36 + $36 = $02 (Tick in 4 frames)
$02 + $36 + $36 + $36 + $36 + $26 = $10 (Tick in 5 frames)
A tempo of $36 produces a tick every 5 frames most of the time, but sometimes it only takes 4 frames. You might think that this disparity would make our song sound uneven, but really a single frame only lasts about 1/60 of a second. Our ears won't notice. It will sound just right to us.
Here is some code that demonstrates how to implement a ticker:
stream_tempo .rs 6 ;the value to add to our ticker total each frame
stream_ticker_total .rs 6 ;our running ticker total.
lda sound_disable_flag
bne .done ;if disable flag is set, don't advance a frame
ldx #$00
lda stream_status, x
and #$01
beq .endloop ;if stream disabled, skip this stream
;add the tempo to the ticker total. If there is a FF-> 0 transition, there is a tick
lda stream_ticker_total, x
adc stream_tempo, x
sta stream_ticker_total, x
bcc .endloop ;carry clear = no tick. if no tick, we are done with this stream
jsr se_fetch_byte ;else there is a tick, so do stuff
;do more stuff
cpx #$06
bne .loop
Anytime we add a new feature to our sound engine we will want to ask ourselves the following questions:
1) Is this a feature that needs to be initialized for each song/sfx?
2) If so, are the values we use to initialize the feature variable (ie, not necessarily the same for every song/sfx)?
If the answer to question #1 is yes, we will have to update sound_load to initialize the feature.
If the answer to question #2 is also yes, we will have to add a field to the song header format. The values to plug into the initialization are different for each song, so the songs' headers will need to provide those values for us.
In the case of our new timing scheme, we have two variables that need to be initialized: sound_ticker_total and sound_tempo. Of the two, only sound_tempo will be variable. Different songs will have different tempos, but they won't need to have different starting sound_ticker_totals. So we will have to add one new field to our song header format for tempo:
main header:
byte # | what it tells us
00 | number of streams
01+ | stream headers (one for each stream)
stream headers:
byte # | what it tells us
00 | which stream (stream number)
01 | status byte
02 | which channel
03 | initial volume (and duty for squares)
04-05 | pointer to data stream
06 | initial tempo
Then we will need to edit sound_load to read this new byte for each stream and store it in RAM. We'll also want to initialize stream_ticker_total to some fixed starting value, preferably a high one so that the first tick will happen without a delay. Finally, we will have to update all of our songs to include tempos in their headers.
Note Lengths
We still have the problem of note lengths. Songs are made up of notes of variable length: quarter notes, eighth notes, sixteenth notes, etc. Our sound engine needs to be able to differentiate between different note lengths. But how? We will use note length counters.
Note Length Counters
Think of the fastest note you'd ever need to play, say a 32nd note. Since that will be our fastest note, we'll give it the smallest count possible: $01. The next fastest note is a 16th note. In music, a 16th note equals two 32nd notes. In other words, a 16th note lasts twice as long as a 32nd note. So we will give it a count value that is twice the count value of our 32nd note: $02. The next fastest note is an 8th note. An 8th note equals two 16th notes. It is twice as long as a 16th note. So its count value will be twice that of the 16th note: $04. Going all the way up to a whole note, we can produce a lookup table like this:
.byte $01 ;32nd note
.byte $02 ;16th note
.byte $04 ;8th note
.byte $08 ;quarter note
.byte $10 ;half note
.byte $20 ;whole note
We'll add more entries later for things like dotted quarter notes, but for now this is sufficient to get us started.
To play different note lengths, we will give each stream a note length counter:
stream_note_length_counter .rs 6
When a note is played, for example an 8th note, its count value will be pulled from the note_length_table and stored in the stream's note length counter. Then every time a tick occurs we will decrement the counter. When the note length counter reaches 0, it will signal to us that our note has finished playing and it is time for the next note. To say it another way, a note's count value is simply how many ticks it lasts. An eighth note is 4 ticks long. A quarter note is 8 ticks long. A half note is 16 ticks long ($10).
Note Lengths in Data
Now we need to add note lengths to our sound data. Recall that we specified that byte values in the range of $80-$9F were note lengths:
lda stream_ptr_LO, x
sta sound_ptr
lda stream_ptr_HI, x
sta sound_ptr+1
ldy #$00
lda [sound_ptr], y
bpl .note ;if < #$80, it's a Note
cmp #$A0
bcc .note_length ;else if < #$A0, it's a Note Length
.opcode: ;else it's an opcode
;do Opcode stuff
;do note length stuff
;do note stuff
So the first byte value that we can use for note lengths is $80. We are going to be reading from a lookup table (note_length_table above), so we should assign the bytes in the same order as the lookup table.
byte | note length
$80 | 32nd note
$81 | 16th note
$82 | 8th note
$83 | quarter note
$84 | half note
$85 | whole note
Now we can use these values in our sound data to represent note lengths:
;music data for song 0, square 1 channel
.byte $82, C3 ;play a C eighth note
.byte $84, D5 ;play a D half note
Of course, memorizing which byte value corresponds to which note length is a pain. Let's create some aliases to make it easier on us when we are creating our sound data:
;note length constants
thirtysecond = $80
sixteenth = $81
eighth = $82
quarter = $83
half = $84
whole = $85
.byte eighth, C3 ;play a C eighth note
.byte half, D5 ;play a D half note
Pulling from the table
There is a small problem here. Lookup tables index from 0. This wasn't a problem for note values (C5, D3, G6) because our note range started from 0 ($00-$7f). But our note length data has a range of $80-$9F. Somehow we will need to translate the note length byte that comes from the data stream into a number we can use to index into our table. In other words, we need to figure out a way to turn $80 into $00, $81 into $01, $82 into $02, etc. Anything come to mind?
If you thought "just subtract $80 from the note length value", give yourself a cookie. If you thought "just chop off the 7-bit", give yourself two cookies. Both solutions work, but the second solution is a little bit faster and only takes one instruction to perform:
lda stream_ptr_LO, x
sta sound_ptr
lda stream_ptr_HI, x
sta sound_ptr+1
ldy #$00
lda [sound_ptr], y
bpl .note ;if < #$80, it's a Note
cmp #$A0
bcc .note_length ;else if < #$A0, it's a Note Length
.opcode: ;else it's an opcode
;do Opcode stuff
;do note length stuff
and #%01111111 ;chop off bit7
sty sound_temp1 ;save Y because we are about to destroy it
lda note_length_table, y ;get the note length count value
sta stream_note_length_counter, x ;stick it in our note length counter
ldy sound_temp1 ;restore Y
iny ;set index to next byte in the stream
jmp .fetch ;fetch another byte
;do note stuff
Notice that we jump back up to .fetch after we set the note length counter. This is so that we can read the note that will surely follow the note length in the data stream. If we simply stop after setting the note length, we'll know how long to play, but we won't know which note to play!
Here's an updated sound_play_frame routine that implements both the ticker and the note length counters. Notice how the note length counter is only decremented when we have a tick, and we only advance the stream when the note length counter reaches zero:
lda sound_disable_flag
bne .done ;if disable flag is set, don't advance a frame
ldx #$00
lda stream_status, x
and #$01
beq .endloop ;if stream disabled, skip this stream
;add the tempo to the ticker total. If there is a FF-> 0 transition, there is a tick
lda stream_ticker_total, x
adc stream_tempo, x
sta stream_ticker_total, x
bcc .endloop ;carry clear = no tick. if no tick, we are done with this stream
dec stream_note_length_counter, x ;else there is a tick. decrement the note length counter
bne .endloop ;if counter is non-zero, our note isn't finished playing yet
jsr se_fetch_byte ;else our note is finished. Time to read from the data stream
;do more stuff. set volume, note, sweep, etc
cpx #$06
bne .loop
We have one last change to make. When we load a new song we will want it to start playing immediately, so we should initialize the stream_note_length_counter in the sound_load routine to do just that. Our sound_play_frame routine decrements the counter and takes action if the result is zero. Therefore, to ensure that our song starts immediately, we should initialize our stream_note_length_counter to $01:
;somewhere inside the loop of sound_load
lda #$01
sta stream_note_length_counter, x
And now our engine supports note lengths. But there is still room for improvement. What if we want to play a series of 8th notes? Not an uncommon thing to have in music. Here is how our data would have to look now:
.byte eighth, C5, eighth, E5, eighth, G5, eighth, C6, eighth, E6, eighth, G6, eighth, C7 ;Cmajor
That's a lot of "eighth" bytes. Wouldn't it be better to just state "eighth" once, and assume that all notes following it are eighth notes? Like this:
.byte eighth, C5, E5, G5, C6, E6, G6, C7 ;Cmajor
That saved us 6 bytes of ROM space. And if you consider that a game may have 20+ songs, each with 4 streams of data, each with potentially several strings of equal-length notes, this kind of change might save us hundreds, maybe even thousands of bytes! Let's do it.
To pull this off, we will have to store the current note length count value in RAM. Then when our note length counter runs to 0, we will refill it with our RAM count value.
stream_note_length .rs 6 ;note length count value
lda stream_ptr_LO, x
sta sound_ptr
lda stream_ptr_HI, x
sta sound_ptr+1
ldy #$00
lda [sound_ptr], y
bpl .note ;if < #$80, it's a Note
cmp #$A0
bcc .note_length ;else if < #$A0, it's a Note Length
.opcode: ;else it's an opcode
;do Opcode stuff
;do note length stuff
and #%01111111 ;chop off bit7
sty sound_temp1 ;save Y because we are about to destroy it
lda note_length_table, y ;get the note length count value
sta stream_note_length, x ;save the note length in RAM so we can use it to refill the counter
sta stream_note_length_counter, x ;stick it in our note length counter
ldy sound_temp1 ;restore Y
iny ;set index to next byte in the stream
jmp .fetch ;fetch another byte
;do note stuff
lda sound_disable_flag
bne .done ;if disable flag is set, don't advance a frame
ldx #$00
lda stream_status, x
and #$01
beq .endloop ;if stream disabled, skip this stream
;add the tempo to the ticker total. If there is a FF-> 0 transition, there is a tick
lda stream_ticker_total, x
adc stream_tempo, x
sta stream_ticker_total, x
bcc .endloop ;carry clear = no tick. if no tick, we are done with this stream
dec stream_note_length_counter, x ;else there is a tick. decrement the note length counter
bne .endloop ;if counter is non-zero, our note isn't finished playing yet
lda stream_note_length, x ;else our note is finished. reload the note length counter
sta stream_note_length_counter, x
jsr se_fetch_byte ;Time to read from the data stream
;do more stuff
cpx #$06
bne .loop
Adding those 4 lines of code just saved us hundreds of bytes of ROM space. A nice tradeoff for 6 bytes of RAM. Now our data will be made up of "strings", where we have a note length followed by a series of notes:
.byte eighth, C5, E5, G5, C6, E6, G6, quarter, C6 ;six 8th notes and a quarter note
Easy to read and easy to write.
Other Note Lengths
Now that everything is setup, we can add more note lengths to our note_length_table. Dotted notes are very common in music. Dotted notes are equal in length to the note plus the next fastest note. For example, a dotted quarter note = a quarter note + an 8th note. A dotted 8th note = an 8th note + a 16th note. Let's add some dotted notes to our table:
.byte $01 ;32nd note
.byte $02 ;16th note
.byte $04 ;8th note
.byte $08 ;quarter note
.byte $10 ;half note
.byte $20 ;whole note
;---dotted notes
.byte $03 ;dotted 16th note
.byte $06 ;dotted 8th note
.byte $0C ;dotted quarter note
.byte $18 ;dotted half note
.byte $30 ;dotted whole note?
The actual order of our note_length_table doesn't matter. We just have to make sure our aliases are in the same order as the table:
;note length constants (aliases)
thirtysecond = $80
sixteenth = $81
eighth = $82
quarter = $83
half = $84
whole = $85
d_sixteenth = $86
d_eighth = $87
d_quarter = $88
d_half = $89
d_whole = $8A ;don't forget we are counting in hex
Your music will determine what other entries you'll need to add to your note length table. If one of your songs has a really really long note, like 3 whole notes tied together, add it to the table ($60) and make an alias for it (whole_x3). If your song contains a note that is seven 8th notes long (a half note plus a dotted quarter note tied together), add it to the table ($1C) and make an alias for it (seven_eighths).
Buffering APU Writes
Before, we've been writing to the APU one stream at a time. If two different streams shared a channel, they would both write to the same APU ports. If three streams were to share a channel, which is possible if there are two different sound effects loaded into SFX_1 and SFX_2, all three would write to the same APU ports in the same frame. This is bad practice. It can also cause some unwanted noise on the square channels.
A better method is to buffer our writes. Instead of writing to the APU ports directly, each stream will instead write its data to temporary ports in RAM. We'll keep our loop order, so sfx streams will still overwrite the music streams. Then when all the streams are done, we will copy the contents of our temporary RAM ports directly to the APU ports all at once. This ensures that the APU ports only get written to once per frame max. To do this, we first need to reserve some RAM space for our temporary port variables:
soft_apu_ports .rs 16
We reserved 16 bytes for our temporary ports. Each one corresponds to an APU port:
soft_apu_ports+0 -> $4000 ;Square 1 ports
soft_apu_ports+1 -> $4001
soft_apu_ports+2 -> $4002
soft_apu_ports+3 -> $4003
soft_apu_ports+4 -> $4004 ;Square 2 ports
soft_apu_ports+5 -> $4005
soft_apu_ports+6 -> $4006
soft_apu_ports+7 -> $4007
soft_apu_ports+8 -> $4008 ;Triangle ports
soft_apu_ports+9 -> $4009 (unused)
soft_apu_ports+10 -> $400A
soft_apu_ports+11 -> $400B
soft_apu_ports+12 -> $400C ;Noise ports
soft_apu_ports+13 -> $400D (unused)
soft_apu_ports+14 -> $400E
soft_apu_ports+15 -> $400F
Let's implement this by working backwards. First we will edit sound_play_frame and pull our call to se_set_apu out of the loop. We do this because we only want to write to the APU once, after all the streams are done looping:
; sound_play_frame advances the sound engine by one frame
lda sound_disable_flag
bne .done ;if disable flag is set, don't advance a frame
ldx #$00
lda stream_status, x
and #$01 ;check whether the stream is active
beq .endloop ;if the channel isn't active, skip it
;add the tempo to the ticker total. If there is a FF-> 0 transition, there is a tick
lda stream_ticker_total, x
adc stream_tempo, x
sta stream_ticker_total, x
bcc .endloop ;carry clear = no tick. if no tick, we are done with this stream
dec stream_note_length_counter, x ;else there is a tick. decrement the note length counter
bne .endloop ;if counter is non-zero, our note isn't finished playing yet
lda stream_note_length, x ;else our note is finished. reload the note length counter
sta stream_note_length_counter, x
jsr se_fetch_byte
cpx #$06
bne .loop
jsr se_set_apu
Next we will modify se_set_apu to copy the temporary APU ports to the real APU ports:
ldy #$0F
cpy #$09
beq .skip ;$4009 is unused
cpy #$0D
beq .skip ;$400D is unused
lda soft_apu_ports, y
sta $4000, y
bpl .loop ;stop the loop when Y is goes from $00 -> $FF
Now we have to write the subroutine that will populate the temporary APU ports with a stream's data. This part will get more complicated as we add more features to our sound engine, but for now it's quite simple:
lda stream_channel, x
asl a
asl a
lda stream_vol_duty, x
sta soft_apu_ports, y ;vol
lda #$08
sta soft_apu_ports+1, y ;sweep
lda stream_note_LO, x
sta soft_apu_ports+2, y ;period LO
lda stream_note_HI, x
sta soft_apu_ports+3, y ;period HI
We will make the call to se_set_temp_ports after our call to se_fetch_byte, where the old se_set_apu call was before we snipped it out of the loop. Notice that we don't bother to check the channel before writing the sweep. se_set_apu takes care of this part for us. There's no harm in writing these values to RAM, so we'll avoid branching here to simplify the code.
Crackling Sounds
Writing to the 4th port of the Square channels ($4003/$4007) has the side effect of resetting the sequencer. If we write here too often, we will get a nasty crackling sound out of our Squares. This is not good.
The way our engine is setup now, we call se_set_apu once per frame. se_set_apu writes to $4003/$4007, so these ports will get written to once per frame. This is too often. We need to find a way to write here less often. We will do this by cutting out redundant writes. If the value we want to write this frame is the same as the value written last frame, skip the write.
First we will need to keep track of what was last written to the ports. This will require some new variables:
sound_sq1_old .rs 1 ;the last value written to $4003
sound_sq2_old .rs 1 ;the last value written to $4007
Whenever we write to one of these ports, we will also write the value to the corresponding sound_port4_old variable. Saving this value will allow us to compare against it next frame. To implement this, we will have to unroll our loop in se_set_apu:
lda soft_apu_ports+0
sta $4000
lda soft_apu_ports+1
sta $4001
lda soft_apu_ports+2
sta $4002
lda soft_apu_ports+3
sta $4003
sta sound_sq1_old ;save the value we just wrote to $4003
lda soft_apu_ports+4
sta $4004
lda soft_apu_ports+5
sta $4005
lda soft_apu_ports+6
sta $4006
lda soft_apu_ports+7
sta $4007
sta sound_sq2_old ;save the value we just wrote to $4007
lda soft_apu_ports+8
sta $4008
lda soft_apu_ports+10
sta $400A
lda soft_apu_ports+11
sta $400B
lda soft_apu_ports+12
sta $400C
lda soft_apu_ports+14
sta $400E
lda soft_apu_ports+15
sta $400F
Now we have a variable that will keep track of the last value written to a channel's 4th port. The next step is to add a check before we write:
lda soft_apu_ports+0
sta $4000
lda soft_apu_ports+1
sta $4001
lda soft_apu_ports+2
sta $4002
lda soft_apu_ports+3
cmp sound_sq1_old ;compare to last write
beq .square2 ;don't write this frame if they were equal
sta $4003
sta sound_sq1_old ;save the value we just wrote to $4003
lda soft_apu_ports+4
sta $4004
lda soft_apu_ports+5
sta $4005
lda soft_apu_ports+6
sta $4006
lda soft_apu_ports+7
cmp sound_sq2_old
beq .triangle
sta $4007
sta sound_sq2_old ;save the value we just wrote to $4007
lda soft_apu_ports+8
sta $4008
lda soft_apu_ports+10 ;there is no $4009, so we skip it
sta $400A
lda soft_apu_ports+11
sta $400B
lda soft_apu_ports+12
sta $400C
lda soft_apu_ports+14 ;there is no $400D, so we skip it
sta $400E
lda soft_apu_ports+15
sta $400F
Finally we have to consider initialization. The only case we really have to worry about is the first time a song is played in the game. Consider what happens if we initialize the sound_sq1_old and sound_sq2_old variables to $00. We are essentially saying that on startup (RESET) the last byte written to $4003/$4007 was a $00, which isn't true of course. On startup, no write has ever been made to these ports. If we initialize to $00, and if the first note of the first song played has a $00 for the high 3 bits of its period, it will get skipped. That is not what we want. Instead, we should initialize these variables to some value that will never be written to $4003/$4007, like $FF. This ensures that the first note(s) played in the game won't be skipped.
lda #$0F
sta $4015 ;enable Square 1, Square 2, Triangle and Noise channels
lda #$00
sta sound_disable_flag ;clear disable flag
;later, if we have other variables we want to initialize, we will do that here.
lda #$FF
sta sound_sq1_old
sta sound_sq2_old
lda #$30
sta soft_apu_ports ;set Square 1 volume to 0
sta soft_apu_ports+4 ;set Square 2 volume to 0
sta soft_apu_ports+12 ;set Noise volume to 0
lda #$80
sta soft_apu_ports+8 ;silence Triangle
The final topic we will cover this lesson is rests. A rest is a period of silence in between notes. Like notes, rests can be of variable length: quarter rest, half rest, whole rest, etc. In other words a rest is a silent note.
So how will we implement it? We will handle rests by considering a rest to be special case note. We will give the rest a dummy period in our note table. Then, when we fetch a byte from the data stream and determine the byte to be a note, we will add an extra check to see if that note is a rest. If it is, we will make sure that it shuts up the stream.
First let's add the rest to our note table. We will give it a dummy period. It doesn't really matter what value we use. I'm going to give it a period of $0000. We will also want to add the rest to our list of note aliases:
.word $07F1, $0780, etc...
;....more note table values here
.word $0000 ;rest. Last entry
;Note: octaves in music traditionally start at C, not A
A1 = $00 ;the "1" means Octave 1
As1 = $01 ;the "s" means "sharp"
Bb1 = $01 ;the "b" means "flat" A# == Bb, so same value
B1 = $02
;..... other aliases here
F9 = $5c
Fs9 = $5d
Gb9 = $5d
rest = $5e
Now we can use the symbol "rest" in our music data. "rest" will evaluate to the value $5E, which falls within our note range ($00-$7F). When our sound engine encounters a $5E in the data stream, it will pull the period ($0000) from the note table and store it in RAM. A period of $0000 is actually low enough to silence the square channels, but the triangle channel is still audible at this period so we have more work to do.
Checking for a rest
When we encounter a rest, we will want to tell the sound engine to shut this stream up until the next note. The rest functions differently from all the other notes, so we will need to make a special check for it in our code. We will make a subroutine se_check_rest to do this for us:
lda stream_ptr_LO, x
sta sound_ptr
lda stream_ptr_HI, x
sta sound_ptr+1
ldy #$00
lda [sound_ptr], y
bpl .note ;if < #$80, it's a Note
cmp #$A0
bcc .note_length ;else if < #$A0, it's a Note Length
.opcode: ;else it's an opcode
;do Opcode stuff
;do Note Length stuff
;do Note stuff
sty sound_temp1 ;save our index into the data stream
asl a
lda note_table, y
sta stream_note_LO, x
lda note_table+1, y
sta stream_note_HI, x
ldy sound_temp1 ;restore data stream index
;check if it's a rest
jsr se_check_rest
adc stream_ptr_LO, x
sta stream_ptr_LO, x
bcc .end
inc stream_ptr_HI, x
se_check_rest will check to see if the note value is equal to $5E or not. If it is, we will need to tell the sound engine to silence the stream. If the note isn't equal to $5E, we can go on our merry way.
How will we silence our stream then? This is actually a little complicated. Recall that the stream's volume (stream_vol_duty) is set in the song's header. se_set_temp_ports copies the value of stream_vol_duty to soft_apu_ports. If we have se_check_rest modify the stream_vol_duty variable directly (set it to 0 volume), the old volume value disappears. We won't know what to restore it to when we are done with our rest. Oh no!
What we will want to do instead is leave stream_vol_duty alone. We will copy it into soft_apu_ports every frame as usual. Then, after the copy we will check to see if we are currently resting. If we are, we will make another write soft_apu_ports with a value that will set the volume to 0. Make sense?
To do this we will need to keep track of our resting status in a variable. If our sound engine encounters a $5E in the data stream, we'll turn our resting status on. If it's not, we'll turn our resting status off. There are only two possibilities: on or off. Rather than declare a whole new block of variables and waste six bytes of RAM, let's assign one of the bits in our stream_status variable to be our rest indicator:
Stream Status Byte
|+- Enabled (0: stream disabled; 1: enabled)
+-- Rest (0: not resting; 1: resting)
Our new subroutine se_check_rest will be in charge of setting or clearing this bit of the status byte:
lda [sound_ptr], y ;read the note byte again
cmp #rest ;is it a rest? (==$5E)
bne .not_rest
lda stream_status, x
ora #%00000010 ;if so, set the rest bit in the status byte
bne .store ;this will always branch. bne is cheaper than a jmp.
lda stream_status, x
and #%11111101 ;clear the rest bit in the status byte
sta stream_status, x
Then we modify se_set_temp_ports to check the rest bit and silence the stream if it is set:
lda stream_channel, x
asl a
asl a
lda stream_vol_duty, x
sta soft_apu_ports, y ;vol
lda #$08
sta soft_apu_ports+1, y ;sweep
lda stream_note_LO, x
sta soft_apu_ports+2, y ;period LO
lda stream_note_HI, x
sta soft_apu_ports+3, y ;period HI
;check the rest flag. if set, overwrite volume with silence value
lda stream_status, x
and #%00000010
beq .done ;if clear, no rest, so quit
lda stream_channel, x
cmp #TRIANGLE ;if triangle, silence with #$80
beq .tri
lda #$30 ;else, silence with #$30
bne .store ;this will always branch. bne is cheaper than a jmp.
lda #$80
sta soft_apu_ports, y
That's it. Now our engine supports rests! They work just like notes, so their lengths are controlled with note lengths:
song_data: ;this data has two quarter rests in it.
.byte half, C2, quarter, rest, eighth, D4, C4, quarter, B3, rest
Putting It All Together
Download and unzip the sample files. Make sure the following files are in the same folder as NESASM3:
Double click tempo.bat. That will run NESASM3 and should produce the tempo.nes file. Run that NES file in FCEUXD SP.
Use the controller to select songs and play them. Controls are as follows:
Up: Play
Down: Stop
Right: Next Song/SFX
Left: Previous Song/SFX
Song0 is a silence song. Not selectable. tempo.asm "plays" song0 to stop the music when you press down. See song0.i to find out how it works.
Song1 is last week's evil sounding series of minor thirds, but much faster now thanks to tempo settings.
Song2 is the same short sound effect from last week.
Song3 is a simple descending chord progression. We saved some bytes in the triangle data using note lengths (compare to last week's file)
Song4 is a new song that showcases variable note lengths and rests.
Song5 is a short sound effect. It plays 10 notes extremely fast. Play it over songs and see how it steals the SQ2 channel from the music.
Try creating your own songs and sound effects and add them into the mix. To add a new song you will need to take the following steps:
1) create a song header and song data (use the included songs as reference). Don't forget to add tempos for each stream in your header. Data streams are terminated with $FF.
2) add your header to the song_headers pointer table at the bottom of sound_engine.asm
3) update the constant NUM_SONGS to reflect the new song number total (also at the bottom of sound_engine.asm)
Although not necessary, I recommend keeping your song data in a separate file like I've done with song0.i, song1.i, song2.i and song3.i. This makes it easier to find the data if you need to edit your song later. If you do this, don't forget to .include your file.
Next Week: Volume Envelopes
This Week: Tempo, Note Lengths, Buffering and Rests
Nerdy Nights Sound: Part 7: Volume Envelopes
Last Week: Tempo, Note Lengths, Buffering and Rests
This Week: Volume Envelopes
Volume Envelopes
This week we will add volume envelopes to our engine. A volume envelope is a series of volume values that are applied to a note one frame at a time. For example, if we had a volume envelope that looked like this:
F E D C 9 5 0
Then whenever we played a note, it would have a volume of F on the first frame, a volume of E on the second frame, then D, then C, then 9, then 5 until it is finally silenced with a volume of 0 on the 7th frame. Applying this volume envelope on our notes would give them a sharp, short staccato feel. Conversely, if we had a volume envelope that looked like this:
1 1 2 2 3 3 4 4 7 7 8 8 A A C C D D E E F F F
Each note would start very quietly and fade in to full volume. Look at this volume envelope:
D D D C B 0 0 0 0 0 0 0 0 6 6 6 5 4 0
Here we start at a high volume (D) and let it ring for 5 frames. Then we silence the note for 8 frames. Then the note comes back at a very low volume for 5 frames. Notes using this volume envelope would sound like they had an faint echo.
As you can see, volume envelopes are pretty cool. We can get a lot of different sounds out of them. Let's add them in.
Volume envelopes are best suited for the square and noise channels where we have full control of the volume. The triangle channel on the other hand doesn't allow much volume control. It only has two settings: full blast and off. We can still apply volume envelopes in a limited way though. Consider these two volume envelopes:
0F 0E 0D 0C 09 05 00
04 04 05 05 06 06 07 07 08 08 09 09 0A 0A 00
These two envelopes would have a vastly different sound on the square channels, but to the triangle they look like this:
On On On On On On Off
On On On On On On On On On On On On On On Off
You don't get the subtle shifts in volume, but you do get a different length. We can use volume envelopes that end in 00 to control when the triangle key-off occurs. Not as cool as full volume control, but still useful.
Defining volume envelopes
First let's define some volume envelopes so we have some data to work with. We'll use some of the examples from above:
.byte $0F, $0E, $0D, $0C, $09, $05, $00
.byte $FF
.byte $01, $01, $02, $02, $03, $03, $04, $04, $07, $07
.byte $08, $08, $0A, $0A, $0C, $0C, $0D, $0D, $0E, $0E
.byte $0F, $0F
.byte $FF
.byte $0D, $0D, $0D, $0C, $0B, $00, $00, $00, $00, $00
.byte $00, $00, $00, $00, $06, $06, $06, $05, $04, $00
.byte $FF
Notice that I terminated each envelope with $FF. We need some terminator value so the engine will know when we've reached the end of the envelope. We could have used any value, but $FF is pretty common.
Next we will make a pointer table that holds the addresses of our volume envelopes:
.word se_ve_1, se_ve_2, se_ve_3
Declaring variables
In order to apply a volume envelope to a particular stream, we will need a variable that tells us which one to use. We will also need an index variable that tells us our current position within the volume envelope:
stream_ve .rs 6 ;current volume envelope
stream_ve_index .rs 6 ;current position within the volume envelope
stream_ve will tell us which volume envelope to use. Code-wise, it will act as an index into our pointer table so we know where to read from. Sound familiar? It works the same way as "song number" did for loading a song. We aren't there yet, but here's a peek at how we will use these variables to read from the volume envelopes. (x holds the stream number):
sty sound_temp1 ;save y because we are about to destroy it.
lda stream_ve, x ;which volume envelope?
asl a ;multiply by 2 because we are indexing into a table of addresses (words)
lda volume_envelopes, y ;get the low byte of the address from the pointer table
sta sound_ptr
lda volume_envelopes+1, y ;get the high byte of the address
sta sound_ptr+1
ldy stream_ve_index, x ;our current position within the volume envelope.
lda [sound_ptr], y ;grab the value.
;check against $FF (our termination value)
;set the volume
;increment stream_ve_index
Compare this code to the beginning of the sound_load routine. Are you starting to see a pattern?
Whenever we add a new feature, we need to consider how we should initialize it. Every stream in our music data will potentially have a different volume envelope, so we should add a volume envelope field to our header. Volume envelopes will deprecate our old "initial volume" field, but we will still need to have duty cycle info, so we'll just rename that field:
main header:
byte # | what it tells us
00 | number of streams
01+ | stream headers (one for each stream)
stream headers:
byte # | what it tells us
00 | which stream (stream number)
01 | status byte
02 | which channel
03 | initial duty (for triangle, set the 7bit)
04 | volume envelope
05-06 | pointer to data stream
07 | initial tempo
To read this data from the header, we will have to insert the following code into our sound_load routine (after reading the duty):
lda [sound_ptr], y ;the stream's volume envelope
sta stream_ve, x
Notes will always start from the beginning of the volume envelope, so we can just initialize stream_ve_index to 0:
lda #$00
sta stream_ve_index, x
Now we just need to make sure to assign volume envelopes to all the streams in our song data and we're ready to go:
.byte $01 ;1 stream
.byte SFX_1 ;which stream
.byte $01 ;status byte (stream enabled)
.byte SQUARE_2 ;which channel
.byte $70 ;initial duty (01). Initial volume deprecated.
.byte $00 ;the first volume envelope (se_ve_1)
.word song5_square2 ;pointer to stream
.byte $FF ;tempo..very fast tempo
Remember that you can always create descriptive aliases for your volume envelopes if you don't want to remember which number is which:
;volume envelope aliases
ve_short_staccato = $00
ve_fade_in = $01
ve_blip_echo = $02
.byte $01 ;1 stream
.byte SFX_1 ;which stream
.byte $01 ;status byte (stream enabled)
.byte SQUARE_2 ;which channel
.byte $7F ;initial duty (01). Initial volume deprecated.
.byte ve_short_staccato ;the first volume envelope (se_ve_1)
.word song5_square2 ;pointer to stream
.byte $FF ;tempo..very fast tempo
Using aliases is a good idea because the assembler will give you an error if you mistype your alias. If you mistype your number, and it is still a valid number, the assembler won't know there's a problem and will assemble it. This kind of bug in your data can be hard to trace.
Implementing Volume Envelopes
To implement volume envelopes, we need to modify the code where we set the volume. Instead of using a fixed value like we were doing before, we need to read from our current position in the volume envelope and use that value instead. Our volume code is starting to get a little complicated, so let's pull it out into its own subroutine. This will make our code easier to follow:
; se_set_temp_ports will copy a stream's sound data to the temporary apu variables
; input:
; X: stream number
lda stream_channel, x
asl a
asl a
jsr se_set_stream_volume ;let's stick all of our volume code into a new subroutine
;less cluttered that way
lda #$08
sta soft_apu_ports+1, y ;sweep
lda stream_note_LO, x
sta soft_apu_ports+2, y ;period LO
lda stream_note_HI, x
sta soft_apu_ports+3, y ;period HI
What should our new subroutine se_set_stream_volume do? First it needs to read a value from our stream's volume envelope. Then it needs to modify the stream's volume using that value. Then we need to update our position within the volume envelope. Finally it needs to check to see if we are resting, and silence the stream if we are (we wrote this code last week). It looks something like this (new code in red):
sty sound_temp1 ;save our index into soft_apu_ports (we are about to destroy y)
lda stream_ve, x ;which volume envelope?
asl a ;multiply by 2 because we are indexing into a table of addresses (words)
lda volume_envelopes, y ;get the low byte of the address from the pointer table
sta sound_ptr ;put it into our pointer variable
lda volume_envelopes+1, y ;get the high byte of the address
sta sound_ptr+1
ldy stream_ve_index, x ;our current position within the volume envelope.
lda [sound_ptr], y ;grab the value.
cmp #$FF
bne .set_vol ;if not FF, set the volume
dec stream_ve_index, x ;else if FF, go back one and read again
jmp .read_ve ; FF essentially tells us to repeat the last
; volume value for the remainder of the note
sta sound_temp2 ;save our new volume value (about to destroy A)
lda stream_vol_duty, x ;get current vol/duty settings
and #$F0 ;zero out the old volume
ora sound_temp2 ;OR our new volume in.
ldy sound_temp1 ;get our index into soft_apu_ports
sta soft_apu_ports, y ;store the volume in our temp port
inc stream_ve_index, x ;set our volume envelop index to the next position
;check the rest flag. if set, overwrite volume with silence value
lda stream_status, x
and #%00000010
beq .done ;if clear, no rest, so quit
lda stream_channel, x
cmp #TRIANGLE ;if triangle, silence with #$80
beq .tri ;else, silence with #$30
lda #$30
bne .store ;this always branches. bne is cheaper than a jmp
lda #$80
sta soft_apu_ports, y
After we read a value from our volume envelope, we AND stream_vol_duty with #$F0. This has the nice effect of clearing the old volume while preserving our squares' duty cycle settings. But we need to be careful here. Recall that the triangle channel's on/off status is controlled by the low 7 bits of the port:
TRI_CTRL ($4008)
|+++++++- Value
+-------- Control Flag (0: use internal counters; 1: disable internal counters)
If any of those Value bits are set, the triangle channel will be considered on. Consider what happens if bit 4, 5 or 6 happen to be set. In this case, ANDing with #$F0 won't turn the triangle channel off. If the volume we pull from the volume envelope is 0, it won't silence our triangle channel because bit 4, 5 or 6 will still be set. If we are careful not to set these bits in our song headers, the problem should never come up. But for completeness we should fix it:
sty sound_temp1 ;save our index into soft_apu_ports (we are about to destroy y)
lda stream_ve, x ;which volume envelope?
asl a ;multiply by 2 because we are indexing into a table of addresses (words)
lda volume_envelopes, y ;get the low byte of the address from the pointer table
sta sound_ptr ;put it into our pointer variable
lda volume_envelopes+1, y ;get the high byte of the address
sta sound_ptr+1
ldy stream_ve_index, x ;our current position within the volume envelope.
lda [sound_ptr], y ;grab the value.
cmp #$FF
bne .set_vol ;if not FF, set the volume
dec stream_ve_index, x ;else if FF, go back one and read again
jmp .read_ve ; FF essentially tells us to repeat the last
; volume value for the remainder of the note
sta sound_temp2 ;save our new volume value (about to destroy A)
bne .squares ;if not triangle channel, go ahead
lda sound_temp2
bne .squares ;else if volume not zero, go ahead (treat same as squares)
lda #$80
bmi .store_vol ;else silence the channel with #$80
lda stream_vol_duty, x ;get current vol/duty settings
and #$F0 ;zero out the old volume
ora sound_temp2 ;OR our new volume in.
ldy sound_temp1 ;get our index into soft_apu_ports
sta soft_apu_ports, y ;store the volume in our temp port
inc stream_ve_index, x ;set our volume envelop index to the next position
;check the rest flag. if set, overwrite volume with silence value
lda stream_status, x
and #%00000010
beq .done ;if clear, no rest, so quit
lda stream_channel, x
cmp #TRIANGLE ;if triangle, silence with #$80
beq .tri ;else, silence with #$30
lda #$30
bne .store ;this always branches. bne is cheaper than a jmp
lda #$80
sta soft_apu_ports, y
New notes
The last thing we need to consider is new notes. When an old note finishes and we start playing a new note, we will want to reset the volume envelope back to the beginning. This is as easy as setting stream_ve_index to 0 when we read a new note:
;...snip... (setup pointers, read byte, test range, etc)
;do Note stuff
sty sound_temp1 ;save our index into the data stream
asl a
lda note_table, y
sta stream_note_LO, x
lda note_table+1, y
sta stream_note_HI, x
ldy sound_temp1 ;restore data stream index
lda #$00
sta stream_ve_index, x ;reset the volume envelope.
;check if it's a rest and modify the status flag appropriately
jsr se_check_rest
;...snip... (update pointer)
And now we have volume envelopes.
Putting It All Together
Download and unzip the sample files. Make sure the following files are in the same folder as NESASM3:
Double click envelopes.bat. That will run NESASM3 and should produce the envelopes.nes file. Run that NES file in FCEUXD SP.
Use the controller to select songs and play them. Controls are as follows:
Up: Play
Down: Stop
Right : Next Song/SFX
Left : Previous Song/SFX
Song0 is a silence song. Not selectable.
Song1 is a boss song from The Guardian Legend, almost the same as the original.
Song2 is the same short sound effect from last week.
Song3 is a song from Dragon Warrior, very close to the original.
Song4 is the same song4 as last week, but volume envelopes allow us to save some bytes by reducing rests.
Song5 is a short sound effect, same as last week.
Try creating your own songs and sound effects and add them into the mix. To add a new song you will need to take the following steps:
1) create a song header and song data (use the included songs as reference). Don't forget to select a volume envelope for each stream in your header. Data streams are terminated with $FF.
2) add your header to the song_headers pointer table at the bottom of sound_engine.asm
3) update the constant NUM_SONGS to reflect the new song number total (also at the bottom of sound_engine.asm)
Try making your own volume envelopes too. To do so you will need to modify vol_envelopes.i. Remember that volume envelopes are terminated with $FF.
Next Week: Opcodes, Looping
Nerdy Nights Sound: Part 8: Opcodes and Looping
Last Week: Volume Envelopes
This Week: Opcodes and Looping
So far our sound engine handles two type of data that it reads from music data streams: notes and note lengths. This is enough to write complex music but of course we are going to want more features. We will want control over the sound of our notes. What if we want to change duty cycles midstream? Or volume envelopes? Or keys? What if we want to loop one part of the song four times? Or loop the entire song continuously? What if we want to play a sound effect as part of a song?
All of these types of features, features where you are issuing commands to the engine, are going to be done through opcodes (also called control codes or command codes). An opcode is a value in the data stream that tells the engine to run a specific, specialized subroutine or piece of code. Most opcodes will have arguments sent along with them. For example, an opcode that changes a stream's volume envelope will come with an argument that specifies which volume envelope to change to.
We've actually been using an opcode for weeks, I just haven't mentioned it. It's the opcode that ends a sound, and we've been encoding it in our data streams as $FF. Here is the code we've been using:
;---snip--- (fetch a byte and range test)
.opcode: ;else it's an opcode
;do Opcode stuff
cmp #$FF
bne .end
lda stream_status, x ;if $FF, end of stream, so disable it and silence
and #%11111110
sta stream_status, x ;clear enable flag in status byte
lda stream_channel, x
beq .silence_tri ;triangle is silenced differently from squares and noise
lda #$30 ;squares and noise silenced with #$30
bne .silence
lda #$80 ;triangle silenced with #$80
sta stream_vol_duty, x ;store silence value in the stream's volume variable.
jmp .update_pointer ;done
;---snip--- (do note lengths and notes, update the stream's pointer)
Here we check if the byte read has a value of $FF. If so we turn the stream off and silence it. That's an opcode.
It would be pretty messy if every opcode we had was just written straight out like this. Normally we would pull this code into its own subroutine, like this:
;---snip--- (fetch a byte and range test)
.opcode: ;else it's an opcode
;do Opcode stuff
cmp #$FF ;end sound opcode
bne .end
jsr se_op_endsound ;call the endsound subroutine
jmp .fetch ;grab the next byte in the stream.
;---snip--- (do note lengths and notes, update the stream's pointer)
lda stream_status, x ;end of stream, so disable it and silence
and #%11111110
sta stream_status, x ;clear enable flag in status byte
lda stream_channel, x
beq .silence_tri ;triangle is silenced differently from squares and noise
lda #$30 ;squares and noise silenced with #$30
bne .silence
lda #$80 ;triangle silenced with #$80
sta stream_vol_duty, x ;store silence value in the stream's volume variable.
The .opcode branch is much shorter now. If we wanted to add more opcodes, we could just add some more compares:
;do Opcode stuff
cmp #$FF ;is it the end sound opcode?
bne .not_FF
jsr se_op_endsound ;if so, call the end sound subroutine
jmp .end ;and finish
cmp #$FE ;else is it the loop opcode?
bne .not_FE
jsr se_op_loop ;if so, call the loop subroutine
jmp .opcode_done
cmp #$FD ;else is it the change volume envelope opcode?
bne .not_FD
jsr se_op_change_ve ;if so, call the change volume envelope subroutine
jmp .opcode_done
iny ;update index to next byte in the data stream
jmp .fetch ;go fetch another byte
This will work, but it's ugly. The more opcodes we add to our engine, the more checks we need to make. What if we have 20 opcodes? Do we really want to do that many compares? It's a waste of ROM space and cycles.
Anytime you find yourself in a situation where you are doing a lot of CMPs on one value, the answer is to use a lookup table. It will simplify everything! We've done it already with notes, note lengths, song numbers and volume envelopes. Could you imagine trying to get a note's period without using the lookup table? It would look like this:
Is the note an A1? If so, use this period, else
Is the note an A#1? If so, use this period, else
Is the note a B1? If so, use this period, else
Is the note a C2? If so, use this period, else
... (about 100 more checks)
Is the note an F#9? If so, use this period, else
Is the note a rest? If so, use this period
That's just crazy. It would be hundreds of lines of unreadable code and you'd run into branch-range errors too. When we use a lookup table, the code is simplified to this:
;do Note stuff
sty sound_temp1 ;save our index
asl a
lda note_table, y
sta stream_note_LO, x
lda note_table+1, y
sta stream_note_HI, x
ldy sound_temp1 ;restore data stream index
Much cleaner. Again, I can't stress it enough: if you find yourself doing lots of CMPs on a single value, use a table instead!
With notes and note lengths we used a straight lookup table of values. With song numbers and volume envelopes we used a special type of lookup table called a pointer table, which stored data addresses. For opcodes we have two choices. We can use something called a jump table or we can use an RTS table. They are almost the same and the difference in performance between the two methods is negligible so for most programmers it's a matter of personal preference.
I prefer RTS tables myself, but we're going to use jump tables because they are easier to explain and understand.
Jump Tables
Ok, here's our problem: Our sound engine has opcodes. A lot of them, let's say 10 or more. Each opcode has its own subroutine. When our sound engine reads an opcode byte from the data stream, we want to avoid a long list of CMP and BNE instructions to select the right subroutine. How do we do that? We use a jump table.
A jump table is similar to a pointer table: it is a table of addresses. But whereas a pointer table holds addresses that point to the start of data, a jump table holds addresses that point to the start of code (ie, the start of subroutines). For example, suppose we have some subroutines:
lda #$00
ldx #$FF
adc #$03
sbc #$03
Here is how a jump table would look using these subroutines:
.word sub_a, sub_b, sub_c
Hey, that's pretty easy. We just use the subroutine label and the assembler will translate that into the address where the subroutine starts. Let's make a jump table for our sound opcode subroutines:
;do stuff
;do stuff
;do stuff
;etc.. more subroutines
;this is our jump table
.word se_op_endsound
.word se_op_infinite_loop
.word se_op_change_ve
;etc, one entry per subroutine
Cool. We have a jump table now. So how do we use it?
Indirect Jumping
The 6502 let's us do some cool things. One of those things is called an indirect jump. An indirect jump let's you stick a destination address into a zero-page pointer variable and jump there. It works like this:
.rsset $0000
;first declare a pointer variable somewhere in the zero-page
jmp_ptr .rs 2 ;2 bytes because an address is always a word
lda #$00
sta jmp_ptr
lda #$80
sta jmp_ptr+1
jmp [jmp_ptr] ;will jump to $8000
Here we stick an address ($8000, lo byte first) into our jmp_ptr variable. Then we do an indirect jump by using the JMP instruction followed by a pointer variable in brackets:
jmp [jmp_ptr] ;indirect jump
This instruction translates into English as "Jump to the address that is stored in jmp_ptr and jmp_ptr+1". It's extrememly useful. We can stick any address we want in there:
lda #$00
sta jmp_ptr
lda #$C0
sta jmp_ptr+1
jmp [jmp_ptr] ;will jump to $C000
We could read an address from ROM and use that if we wanted to, for example our reset vector:
lda $FFFC
sta jmp_ptr
lda $FFFD
sta jmp_ptr+1
jmp [jmp_ptr] ;will jump to our reset routine
And we can use it in combination with our jump table:
lda sound_opcodes, y ;read low byte of address from jump table
sta jmp_ptr
lda sound_opcodes+1, y ;read high byte
sta jmp_ptr+1
jmp [jmp_ptr] ;will jump to whatever address we pulled from the table.
Pretty powerful. We can dynamically jump to any section of code we want!
So we know how to build a jump table and we know how to do an indirect jump. Let's tie it all together and stick it into our sound engine. Let's start with se_fetch_byte. se_fetch_byte reads a byte from the data stream and range-checks it to see if it is a note, note length or opcode. Recall that notes have a byte range of $00-$7F. Note lengths have a range of $80-$9F. The opcode byte range is $A0-$FF:
lda stream_ptr_LO, x
sta sound_ptr
lda stream_ptr_HI, x
sta sound_ptr+1
ldy #$00
lda [sound_ptr], y
bpl .note ;if < #$80, it's a Note
cmp #$A0
bcc .note_length ;else if < #$A0, it's a Note Length
.opcode: ;else ($A0-$FF) it's an opcode
;do Opcode stuff
;do note length stuff
;do note stuff
So we need to assign our opcodes to values between $A0 and $FF. Just as with notes and note lengths, the opcode byte we read from the data stream will be used as a table index (after subtracting $A0), so we will assign our opcodes in the same order as our table:
.word se_op_endsound ;this should be $A0
.word se_op_infinite_loop ;this should be $A1
.word se_op_change_ve ;this should be $A2
;etc, 1 entry per subroutine
;these are aliases to use in the sound data.
endsound = $A0
loop = $A1 ;be careful of conflicts here. this might be too generic. maybe song_loop is better
volume_envelope = $A2
Now let's alter se_fetch_byte to take care of our opcodes:
lda stream_ptr_LO, x
sta sound_ptr
lda stream_ptr_HI, x
sta sound_ptr+1
ldy #$00
lda [sound_ptr], y
bpl .note ;if < #$80, it's a Note
cmp #$A0
bcc .note_length ;else if < #$A0, it's a Note Length
.opcode: ;else ($A0-$FF) it's an opcode
;do Opcode stuff
jsr se_opcode_launcher ;launch our opcode!!!
iny ;next position in the data stream
lda stream_status, x
and #%00000001
bne .fetch ;after our opcode is done, grab another byte unless the stream is disabled
rts ; in which case we quit (explained below)
;do note length stuff
;do note stuff
I added a call to a subroutine called se_opcode_launcher and a little branch. Not a big change is it? But there's an important detail here. se_opcode_launcher will be a short, simple subroutine that will read from the jump table and perform an indirect jump. It looks like this:
sty sound_temp1 ;save y register, because we are about to destroy it
sbc #$A0 ;turn our opcode byte into a table index by subtracting $A0
; $A0->$00, $A1->$01, $A2->$02, etc. Tables index from $00.
asl a ;multiply by 2 because we index into a table of addresses (words)
lda sound_opcodes, y ;get low byte of subroutine address
sta jmp_ptr
lda sound_opcodes+1, y ;get high byte
sta jmp_ptr+1
ldy sound_temp1 ;restore our y register
iny ;set to next position in data stream (assume an argument)
jmp [jmp_ptr] ;indirect jump to our opcode subroutine
Short and simple. So why did I wrap this code in its own subroutine? Why not just stick this code as-is in the .opcode branch of se_fetch_byte? Because we need a place to return to.
The JSR and RTS instructions work as a pair. They go hand in hand. They need each other. Without going into too much detail, this is what goes on behind the scenes:
JSR sticks a return address on the stack and jumps to a subroutine. One way to look at it is to think of JSR as a JMP that remembers where it started from.
RTS pops the return address off the stack and jumps there.
So JSR leaves a treasure map for RTS to pick up and follow later. The key point here is that RTS expects a return address to be waiting for it on the stack.
Now our opcode subroutines all end in an RTS instruction. Do you see the potential problem here?
We call our opcode subroutines using an indirect jump. This requires us to use a JMP instruction, not a JSR instruction. A JMP instruction doesn't remember where it started from. No return address is pushed onto the stack with a JMP instruction. So when we jump to our opcode subroutine and hit the RTS instruction at the end, there is no return address waiting for us! The RTS will pull whatever random values happen to be on the stack at the time and jump there. We'll end up somewhere random and our program will surely crash!
To fix this, we wrap our indirect jump in a subroutine, se_opcode_launcher. We call it with a JSR instruction, completing the JSR/RTS pair:
jsr se_opcode_launcher ;this jsr will let us remember where we came from
This JSR instruction will stick a return address on the stack for us. Then inside se_opcode_launcher we perform our indirect jump to our desired opcode subroutine. Now when we hit that RTS instruction at the end of the opcode subroutine we have a return address waiting for us on the stack. Our program returns back to where we started. We are safe.
Opcode Subroutines
With our opcode launcher written, we are all set up to make opcodes. We already have one written: the endsound opcode. This is the opcode we will use to terminate sound effects. Sound effects don't loop continuously like songs do, so they need to be stopped. Let's take a look again:
lda stream_status, x ;end of stream, so disable it and silence
and #%11111110
sta stream_status, x ;clear enable flag in status byte
lda stream_channel, x
beq .silence_tri ;triangle is silenced differently from squares and noise
lda #$30 ;squares and noise silenced with #$30
bne .silence ; (this will always branch. bne is cheaper than a jmp)
lda #$80 ;triangle silenced with #$80
sta stream_vol_duty, x ;store silence value in the stream's volume variable.
This opcode is special. It's the reason for the check after the call to se_opcode_launcher:
.opcode: ;else ($A0-$FF) it's an opcode
;do Opcode stuff
jsr se_opcode_launcher
iny ;next position in the data stream
lda stream_status, x
and #%00000001
bne .fetch ;after our opcode is done, grab another byte unless the stream is disabled
rts ; in which case we quit (explained below)
Normally, we want se_fetch_byte to keep fetching bytes until it hits a note. Recall that with note lengths we jumped back to .fetch after setting the new note length. This is because after setting the length of the note, we needed to know WHAT note to play. So we fetch another byte. The same thing is true of opcodes. If we change the volume envelope with an opcode, great! But we still need to know what note to play next. If we use an opcode to switch our square's duty cycle, great! But we still need to know what note to play next. If we use an opcode to loop back to the beginning of the song, that's great! But we still need to read that first note of the song. This is why we jump back to fetch a byte after we run an opcode.
The ONE exception to this rule is when we end a sound effect. We are terminating the sound effect completely, so there is no next note. We don't want to fetch something that isn't there, so we need to skip the jump. That's why we check the status byte after we run the opcode. If the stream is disabled by the endsound opcode, we are finished. Otherwise, fetch another byte.
The next opcode in our list is the loop opcode. This is the opcode that we will stick at the end of every song to tell the sound engine to play the song again, and again and again. It is actually quite easy to implement. It takes a 2-byte argument, which is the address to loop back to. The subroutine looks like this:
lda [sound_ptr], y ;read LO byte of the address argument from the data stream
sta stream_ptr_LO, x ;save as our new data stream position
lda [sound_ptr], y ;read HI byte of the address argument from the data stream
sta stream_ptr_HI, x ;save as our new data stream position data stream position
sta sound_ptr+1 ;update the pointer to reflect the new position.
lda stream_ptr_LO, x
sta sound_ptr
ldy #$FF ;after opcodes return, we do an iny. Since we reset
;the stream buffer position, we will want y to start out at 0 again.
The first thing to notice about this subroutine is that it reads two bytes from the data stream. This is the address argument that gets passed along with the opcode. To make it clear, let's look at some example sound data:
.byte eighth ;set note length to eighth notes
.byte C5, E5, G5, C6, E6, G6, C5, Eb5, G5, C6, Eb6, half, G6 ;play some notes
.byte loop ;this alias evaluates to $A1, the loop opcode
.word song1_square1 ;this evaluates to the address of the song1_square1 label
;ie, the address we want to loop to.
After the "loop" opcode comes a word which is the address to loop back to. In this example I chose to loop back to the beginning of the stream data.
So what does our loop opcode do? It reads the first byte of this address argument (the low byte) and stores it in stream_ptr_LO. Then it reads the second byte of the address argument (the high byte) and stores it in stream_ptr_HI. These are the variables that keep track of our data stream position! The loop opcode just changes these values to some address that we specify. Not too complicated at all. The last step is to update the actual pointer (sound_ptr) so that the next byte we read from the data stream will be the first note we looped back to.
In the example sound data above I looped back to the beginning of the stream data, but there's nothing stopping me from looping somewhere else:
;intro, don't loop this part
.byte quarter
.byte C4, C4, C4, C4
.loop_point: ;this is where we will loop back to.
.byte eighth ;set note length to eighth notes
.byte C5, E5, G5, C6, E6, G6, C5, Eb5, G5, C6, Eb6, half, G6
.byte loop ;this alias evaluates to $A1, the loop opcode
.word .loop_point ;this evaluates to the address of the .loop_point label
;ie, the address we want to loop to.
Technically we can also "loop" to a forward position, in which case it's actually more like a jump than a loop. That's all a loop is really: a jump... backwards.
Changing Volume Envelopes
Let's write the opcode subroutine to change volume envelopes. This one is even easier. It takes one argument, which will be which volume envelope to switch to:
lda [sound_ptr], y ;read the argument
sta stream_ve, x ;store it in our volume envelope variable
lda #$00
sta stream_ve_index, x ;reset volume envelope index to the beginning
That's it!
Changing Duty Cycles
Now let's add an opcode that will change the duty cycle for a square stream. This one also takes one argument: which duty cycle to switch to.
lda [sound_ptr], y ;read the argument (which duty cycle to change to)
sta stream_vol_duty, x ;store it.
Done! Now we have the subroutine, but we still need to add it to our jump table:
.word se_op_endsound ;this should be $A0
.word se_op_loop ;this should be $A1
.word se_op_change_ve ;this should be $A2
.word se_op_duty ;this should be $A3
;etc, 1 entry per subroutine
;these are aliases to use in the sound data.
endsound = $A0
loop = $A1
volume_envelope = $A2
duty = $A3
And it's ready to use:
;intro, don't loop this part
.byte quarter
.byte C4, C4, C4, C4
.loop_point: ;this is where we will loop back to.
.byte duty, $B0 ;change the duty cycle
.byte volume_envelope, ve_blip_echo ;change the volume envelope
.byte eighth ;set note length to eighth notes
.byte C5, E5, G5, C6, E6, G6 ;play some notes
.byte duty, $30 ;change the duty cycle
.byte volume_envelope, ve_short_staccato ;change volume envelope
.byte C5, Eb5, G5, C6, Eb6, half, G6 ;play some eighth notes and a half note
.byte loop ;loop to .loop_point
.word .loop_point
sound_engine.asm is getting pretty bulky with all these subroutines. It will only get bigger as we add more opcodes. It's nice to have all of our opcodes together in one place, but it's annoying to have to scroll around to find them. So let's pull all of our opcodes into their own file: sound_opcodes.asm. Then, at the bottom of sound_engine.asm, we can .include it:
.include "sound_opcodes.asm" ;our opcode subroutines, jump table and aliases
.include "note_table.i" ;period lookup table for notes
.include "note_length_table.i"
.include "vol_envelopes.i"
.include "song0.i" ;holds the data for song 0 (header and data streams)
.include "song1.i" ;holds the data for song 1
.include "song2.i"
.include "song3.i"
.include "song4.i"
.include "song5.i"
.include "song6.i" ;oooh.. new song!
I gave it the extension .asm because it contains code as well as data, and I like to be able to tell at a glance what files have what in them. Now whenever we want to add new opcodes, or tweak old ones, we have them nice and compact in their own file.
Updating Sound Data
Whenever we add new things to our sound engine, we have to think about how it will affect our old sound data. This week we added opcodes, which will change our songs and sound effects terminate. Before we were terminating them with $FF. This won't work anymore because $FF doesn't do anything. For songs, we should terminate with "loop" followed by an address to loop to. With sound effects we should terminate with the opcode "endsound". See the included songs and sound effects for examples.
RTS Tables
We talked about jump tables and indirect jumping this week. Another method for doing the same thing involves something called an RTS table and the RTS Trick. I won't cover it in these tutorials, but if you are curious to know how this works you can read this nesdev wiki article I wrote about the RTS Trick.
Putting It All Together
Download and unzip the sample files. Make sure the following files are in the same folder as NESASM3:
Double click opcodes.bat. That will run NESASM3 and should produce the opcodes.nes file. Run that NES file in FCEUXD SP.
Use the controller to select songs and play them. Controls are as follows:
Up: Play
Down: Stop
Right: Next Song/SFX
Left: Previous Song/SFX
Song0 is a silence song. Not selectable.
Song1 is a boss song from The Guardian Legend. Now it loops!
Song2 is the same short sound effect from last week. Terminated with endsound.
Song3 is a song from Dragon Warrior. Now it loops!
Song4 is the same song4 as last week, but now it loops!
Song5 is a short sound effect, terminated with the endsound opcode.
Song6 should be familiar to readers of this forum. Do you recognize it? It utilizes opcodes for changing duty cycles and volume envelopes. Plus it loops!
Try adding your own songs and sound effects in. Try to add your own opcodes too. Here's some ideas for opcodes:
1. Trigger a sound effect mid-song
2. Implement duty cycle envelopes (similar to volume envelopes). Then make an opcode that allows you to change it.
3. Finite loops
Next Week: more opcode fun. Finite Loops, Changing Keys and Autom... .
This Week: Opcodes and Looping
So far our sound engine handles two type of data that it reads from music data streams: notes and note lengths. This is enough to write complex music but of course we are going to want more features. We will want control over the sound of our notes. What if we want to change duty cycles midstream? Or volume envelopes? Or keys? What if we want to loop one part of the song four times? Or loop the entire song continuously? What if we want to play a sound effect as part of a song?
All of these types of features, features where you are issuing commands to the engine, are going to be done through opcodes (also called control codes or command codes). An opcode is a value in the data stream that tells the engine to run a specific, specialized subroutine or piece of code. Most opcodes will have arguments sent along with them. For example, an opcode that changes a stream's volume envelope will come with an argument that specifies which volume envelope to change to.
We've actually been using an opcode for weeks, I just haven't mentioned it. It's the opcode that ends a sound, and we've been encoding it in our data streams as $FF. Here is the code we've been using:
;---snip--- (fetch a byte and range test)
.opcode: ;else it's an opcode
;do Opcode stuff
cmp #$FF
bne .end
lda stream_status, x ;if $FF, end of stream, so disable it and silence
and #%11111110
sta stream_status, x ;clear enable flag in status byte
lda stream_channel, x
beq .silence_tri ;triangle is silenced differently from squares and noise
lda #$30 ;squares and noise silenced with #$30
bne .silence
lda #$80 ;triangle silenced with #$80
sta stream_vol_duty, x ;store silence value in the stream's volume variable.
jmp .update_pointer ;done
;---snip--- (do note lengths and notes, update the stream's pointer)
Here we check if the byte read has a value of $FF. If so we turn the stream off and silence it. That's an opcode.
It would be pretty messy if every opcode we had was just written straight out like this. Normally we would pull this code into its own subroutine, like this:
;---snip--- (fetch a byte and range test)
.opcode: ;else it's an opcode
;do Opcode stuff
cmp #$FF ;end sound opcode
bne .end
jsr se_op_endsound ;call the endsound subroutine
jmp .fetch ;grab the next byte in the stream.
;---snip--- (do note lengths and notes, update the stream's pointer)
lda stream_status, x ;end of stream, so disable it and silence
and #%11111110
sta stream_status, x ;clear enable flag in status byte
lda stream_channel, x
beq .silence_tri ;triangle is silenced differently from squares and noise
lda #$30 ;squares and noise silenced with #$30
bne .silence
lda #$80 ;triangle silenced with #$80
sta stream_vol_duty, x ;store silence value in the stream's volume variable.
The .opcode branch is much shorter now. If we wanted to add more opcodes, we could just add some more compares:
;do Opcode stuff
cmp #$FF ;is it the end sound opcode?
bne .not_FF
jsr se_op_endsound ;if so, call the end sound subroutine
jmp .end ;and finish
cmp #$FE ;else is it the loop opcode?
bne .not_FE
jsr se_op_loop ;if so, call the loop subroutine
jmp .opcode_done
cmp #$FD ;else is it the change volume envelope opcode?
bne .not_FD
jsr se_op_change_ve ;if so, call the change volume envelope subroutine
jmp .opcode_done
iny ;update index to next byte in the data stream
jmp .fetch ;go fetch another byte
This will work, but it's ugly. The more opcodes we add to our engine, the more checks we need to make. What if we have 20 opcodes? Do we really want to do that many compares? It's a waste of ROM space and cycles.
Anytime you find yourself in a situation where you are doing a lot of CMPs on one value, the answer is to use a lookup table. It will simplify everything! We've done it already with notes, note lengths, song numbers and volume envelopes. Could you imagine trying to get a note's period without using the lookup table? It would look like this:
Is the note an A1? If so, use this period, else
Is the note an A#1? If so, use this period, else
Is the note a B1? If so, use this period, else
Is the note a C2? If so, use this period, else
... (about 100 more checks)
Is the note an F#9? If so, use this period, else
Is the note a rest? If so, use this period
That's just crazy. It would be hundreds of lines of unreadable code and you'd run into branch-range errors too. When we use a lookup table, the code is simplified to this:
;do Note stuff
sty sound_temp1 ;save our index
asl a
lda note_table, y
sta stream_note_LO, x
lda note_table+1, y
sta stream_note_HI, x
ldy sound_temp1 ;restore data stream index
Much cleaner. Again, I can't stress it enough: if you find yourself doing lots of CMPs on a single value, use a table instead!
With notes and note lengths we used a straight lookup table of values. With song numbers and volume envelopes we used a special type of lookup table called a pointer table, which stored data addresses. For opcodes we have two choices. We can use something called a jump table or we can use an RTS table. They are almost the same and the difference in performance between the two methods is negligible so for most programmers it's a matter of personal preference.
I prefer RTS tables myself, but we're going to use jump tables because they are easier to explain and understand.
Jump Tables
Ok, here's our problem: Our sound engine has opcodes. A lot of them, let's say 10 or more. Each opcode has its own subroutine. When our sound engine reads an opcode byte from the data stream, we want to avoid a long list of CMP and BNE instructions to select the right subroutine. How do we do that? We use a jump table.
A jump table is similar to a pointer table: it is a table of addresses. But whereas a pointer table holds addresses that point to the start of data, a jump table holds addresses that point to the start of code (ie, the start of subroutines). For example, suppose we have some subroutines:
lda #$00
ldx #$FF
adc #$03
sbc #$03
Here is how a jump table would look using these subroutines:
.word sub_a, sub_b, sub_c
Hey, that's pretty easy. We just use the subroutine label and the assembler will translate that into the address where the subroutine starts. Let's make a jump table for our sound opcode subroutines:
;do stuff
;do stuff
;do stuff
;etc.. more subroutines
;this is our jump table
.word se_op_endsound
.word se_op_infinite_loop
.word se_op_change_ve
;etc, one entry per subroutine
Cool. We have a jump table now. So how do we use it?
Indirect Jumping
The 6502 let's us do some cool things. One of those things is called an indirect jump. An indirect jump let's you stick a destination address into a zero-page pointer variable and jump there. It works like this:
.rsset $0000
;first declare a pointer variable somewhere in the zero-page
jmp_ptr .rs 2 ;2 bytes because an address is always a word
lda #$00
sta jmp_ptr
lda #$80
sta jmp_ptr+1
jmp [jmp_ptr] ;will jump to $8000
Here we stick an address ($8000, lo byte first) into our jmp_ptr variable. Then we do an indirect jump by using the JMP instruction followed by a pointer variable in brackets:
jmp [jmp_ptr] ;indirect jump
This instruction translates into English as "Jump to the address that is stored in jmp_ptr and jmp_ptr+1". It's extrememly useful. We can stick any address we want in there:
lda #$00
sta jmp_ptr
lda #$C0
sta jmp_ptr+1
jmp [jmp_ptr] ;will jump to $C000
We could read an address from ROM and use that if we wanted to, for example our reset vector:
lda $FFFC
sta jmp_ptr
lda $FFFD
sta jmp_ptr+1
jmp [jmp_ptr] ;will jump to our reset routine
And we can use it in combination with our jump table:
lda sound_opcodes, y ;read low byte of address from jump table
sta jmp_ptr
lda sound_opcodes+1, y ;read high byte
sta jmp_ptr+1
jmp [jmp_ptr] ;will jump to whatever address we pulled from the table.
Pretty powerful. We can dynamically jump to any section of code we want!
So we know how to build a jump table and we know how to do an indirect jump. Let's tie it all together and stick it into our sound engine. Let's start with se_fetch_byte. se_fetch_byte reads a byte from the data stream and range-checks it to see if it is a note, note length or opcode. Recall that notes have a byte range of $00-$7F. Note lengths have a range of $80-$9F. The opcode byte range is $A0-$FF:
lda stream_ptr_LO, x
sta sound_ptr
lda stream_ptr_HI, x
sta sound_ptr+1
ldy #$00
lda [sound_ptr], y
bpl .note ;if < #$80, it's a Note
cmp #$A0
bcc .note_length ;else if < #$A0, it's a Note Length
.opcode: ;else ($A0-$FF) it's an opcode
;do Opcode stuff
;do note length stuff
;do note stuff
So we need to assign our opcodes to values between $A0 and $FF. Just as with notes and note lengths, the opcode byte we read from the data stream will be used as a table index (after subtracting $A0), so we will assign our opcodes in the same order as our table:
.word se_op_endsound ;this should be $A0
.word se_op_infinite_loop ;this should be $A1
.word se_op_change_ve ;this should be $A2
;etc, 1 entry per subroutine
;these are aliases to use in the sound data.
endsound = $A0
loop = $A1 ;be careful of conflicts here. this might be too generic. maybe song_loop is better
volume_envelope = $A2
Now let's alter se_fetch_byte to take care of our opcodes:
lda stream_ptr_LO, x
sta sound_ptr
lda stream_ptr_HI, x
sta sound_ptr+1
ldy #$00
lda [sound_ptr], y
bpl .note ;if < #$80, it's a Note
cmp #$A0
bcc .note_length ;else if < #$A0, it's a Note Length
.opcode: ;else ($A0-$FF) it's an opcode
;do Opcode stuff
jsr se_opcode_launcher ;launch our opcode!!!
iny ;next position in the data stream
lda stream_status, x
and #%00000001
bne .fetch ;after our opcode is done, grab another byte unless the stream is disabled
rts ; in which case we quit (explained below)
;do note length stuff
;do note stuff
I added a call to a subroutine called se_opcode_launcher and a little branch. Not a big change is it? But there's an important detail here. se_opcode_launcher will be a short, simple subroutine that will read from the jump table and perform an indirect jump. It looks like this:
sty sound_temp1 ;save y register, because we are about to destroy it
sbc #$A0 ;turn our opcode byte into a table index by subtracting $A0
; $A0->$00, $A1->$01, $A2->$02, etc. Tables index from $00.
asl a ;multiply by 2 because we index into a table of addresses (words)
lda sound_opcodes, y ;get low byte of subroutine address
sta jmp_ptr
lda sound_opcodes+1, y ;get high byte
sta jmp_ptr+1
ldy sound_temp1 ;restore our y register
iny ;set to next position in data stream (assume an argument)
jmp [jmp_ptr] ;indirect jump to our opcode subroutine
Short and simple. So why did I wrap this code in its own subroutine? Why not just stick this code as-is in the .opcode branch of se_fetch_byte? Because we need a place to return to.
The JSR and RTS instructions work as a pair. They go hand in hand. They need each other. Without going into too much detail, this is what goes on behind the scenes:
JSR sticks a return address on the stack and jumps to a subroutine. One way to look at it is to think of JSR as a JMP that remembers where it started from.
RTS pops the return address off the stack and jumps there.
So JSR leaves a treasure map for RTS to pick up and follow later. The key point here is that RTS expects a return address to be waiting for it on the stack.
Now our opcode subroutines all end in an RTS instruction. Do you see the potential problem here?
We call our opcode subroutines using an indirect jump. This requires us to use a JMP instruction, not a JSR instruction. A JMP instruction doesn't remember where it started from. No return address is pushed onto the stack with a JMP instruction. So when we jump to our opcode subroutine and hit the RTS instruction at the end, there is no return address waiting for us! The RTS will pull whatever random values happen to be on the stack at the time and jump there. We'll end up somewhere random and our program will surely crash!
To fix this, we wrap our indirect jump in a subroutine, se_opcode_launcher. We call it with a JSR instruction, completing the JSR/RTS pair:
jsr se_opcode_launcher ;this jsr will let us remember where we came from
This JSR instruction will stick a return address on the stack for us. Then inside se_opcode_launcher we perform our indirect jump to our desired opcode subroutine. Now when we hit that RTS instruction at the end of the opcode subroutine we have a return address waiting for us on the stack. Our program returns back to where we started. We are safe.
Opcode Subroutines
With our opcode launcher written, we are all set up to make opcodes. We already have one written: the endsound opcode. This is the opcode we will use to terminate sound effects. Sound effects don't loop continuously like songs do, so they need to be stopped. Let's take a look again:
lda stream_status, x ;end of stream, so disable it and silence
and #%11111110
sta stream_status, x ;clear enable flag in status byte
lda stream_channel, x
beq .silence_tri ;triangle is silenced differently from squares and noise
lda #$30 ;squares and noise silenced with #$30
bne .silence ; (this will always branch. bne is cheaper than a jmp)
lda #$80 ;triangle silenced with #$80
sta stream_vol_duty, x ;store silence value in the stream's volume variable.
This opcode is special. It's the reason for the check after the call to se_opcode_launcher:
.opcode: ;else ($A0-$FF) it's an opcode
;do Opcode stuff
jsr se_opcode_launcher
iny ;next position in the data stream
lda stream_status, x
and #%00000001
bne .fetch ;after our opcode is done, grab another byte unless the stream is disabled
rts ; in which case we quit (explained below)
Normally, we want se_fetch_byte to keep fetching bytes until it hits a note. Recall that with note lengths we jumped back to .fetch after setting the new note length. This is because after setting the length of the note, we needed to know WHAT note to play. So we fetch another byte. The same thing is true of opcodes. If we change the volume envelope with an opcode, great! But we still need to know what note to play next. If we use an opcode to switch our square's duty cycle, great! But we still need to know what note to play next. If we use an opcode to loop back to the beginning of the song, that's great! But we still need to read that first note of the song. This is why we jump back to fetch a byte after we run an opcode.
The ONE exception to this rule is when we end a sound effect. We are terminating the sound effect completely, so there is no next note. We don't want to fetch something that isn't there, so we need to skip the jump. That's why we check the status byte after we run the opcode. If the stream is disabled by the endsound opcode, we are finished. Otherwise, fetch another byte.
The next opcode in our list is the loop opcode. This is the opcode that we will stick at the end of every song to tell the sound engine to play the song again, and again and again. It is actually quite easy to implement. It takes a 2-byte argument, which is the address to loop back to. The subroutine looks like this:
lda [sound_ptr], y ;read LO byte of the address argument from the data stream
sta stream_ptr_LO, x ;save as our new data stream position
lda [sound_ptr], y ;read HI byte of the address argument from the data stream
sta stream_ptr_HI, x ;save as our new data stream position data stream position
sta sound_ptr+1 ;update the pointer to reflect the new position.
lda stream_ptr_LO, x
sta sound_ptr
ldy #$FF ;after opcodes return, we do an iny. Since we reset
;the stream buffer position, we will want y to start out at 0 again.
The first thing to notice about this subroutine is that it reads two bytes from the data stream. This is the address argument that gets passed along with the opcode. To make it clear, let's look at some example sound data:
.byte eighth ;set note length to eighth notes
.byte C5, E5, G5, C6, E6, G6, C5, Eb5, G5, C6, Eb6, half, G6 ;play some notes
.byte loop ;this alias evaluates to $A1, the loop opcode
.word song1_square1 ;this evaluates to the address of the song1_square1 label
;ie, the address we want to loop to.
After the "loop" opcode comes a word which is the address to loop back to. In this example I chose to loop back to the beginning of the stream data.
So what does our loop opcode do? It reads the first byte of this address argument (the low byte) and stores it in stream_ptr_LO. Then it reads the second byte of the address argument (the high byte) and stores it in stream_ptr_HI. These are the variables that keep track of our data stream position! The loop opcode just changes these values to some address that we specify. Not too complicated at all. The last step is to update the actual pointer (sound_ptr) so that the next byte we read from the data stream will be the first note we looped back to.
In the example sound data above I looped back to the beginning of the stream data, but there's nothing stopping me from looping somewhere else:
;intro, don't loop this part
.byte quarter
.byte C4, C4, C4, C4
.loop_point: ;this is where we will loop back to.
.byte eighth ;set note length to eighth notes
.byte C5, E5, G5, C6, E6, G6, C5, Eb5, G5, C6, Eb6, half, G6
.byte loop ;this alias evaluates to $A1, the loop opcode
.word .loop_point ;this evaluates to the address of the .loop_point label
;ie, the address we want to loop to.
Technically we can also "loop" to a forward position, in which case it's actually more like a jump than a loop. That's all a loop is really: a jump... backwards.
Changing Volume Envelopes
Let's write the opcode subroutine to change volume envelopes. This one is even easier. It takes one argument, which will be which volume envelope to switch to:
lda [sound_ptr], y ;read the argument
sta stream_ve, x ;store it in our volume envelope variable
lda #$00
sta stream_ve_index, x ;reset volume envelope index to the beginning
That's it!
Changing Duty Cycles
Now let's add an opcode that will change the duty cycle for a square stream. This one also takes one argument: which duty cycle to switch to.
lda [sound_ptr], y ;read the argument (which duty cycle to change to)
sta stream_vol_duty, x ;store it.
Done! Now we have the subroutine, but we still need to add it to our jump table:
.word se_op_endsound ;this should be $A0
.word se_op_loop ;this should be $A1
.word se_op_change_ve ;this should be $A2
.word se_op_duty ;this should be $A3
;etc, 1 entry per subroutine
;these are aliases to use in the sound data.
endsound = $A0
loop = $A1
volume_envelope = $A2
duty = $A3
And it's ready to use:
;intro, don't loop this part
.byte quarter
.byte C4, C4, C4, C4
.loop_point: ;this is where we will loop back to.
.byte duty, $B0 ;change the duty cycle
.byte volume_envelope, ve_blip_echo ;change the volume envelope
.byte eighth ;set note length to eighth notes
.byte C5, E5, G5, C6, E6, G6 ;play some notes
.byte duty, $30 ;change the duty cycle
.byte volume_envelope, ve_short_staccato ;change volume envelope
.byte C5, Eb5, G5, C6, Eb6, half, G6 ;play some eighth notes and a half note
.byte loop ;loop to .loop_point
.word .loop_point
sound_engine.asm is getting pretty bulky with all these subroutines. It will only get bigger as we add more opcodes. It's nice to have all of our opcodes together in one place, but it's annoying to have to scroll around to find them. So let's pull all of our opcodes into their own file: sound_opcodes.asm. Then, at the bottom of sound_engine.asm, we can .include it:
.include "sound_opcodes.asm" ;our opcode subroutines, jump table and aliases
.include "note_table.i" ;period lookup table for notes
.include "note_length_table.i"
.include "vol_envelopes.i"
.include "song0.i" ;holds the data for song 0 (header and data streams)
.include "song1.i" ;holds the data for song 1
.include "song2.i"
.include "song3.i"
.include "song4.i"
.include "song5.i"
.include "song6.i" ;oooh.. new song!
I gave it the extension .asm because it contains code as well as data, and I like to be able to tell at a glance what files have what in them. Now whenever we want to add new opcodes, or tweak old ones, we have them nice and compact in their own file.
Updating Sound Data
Whenever we add new things to our sound engine, we have to think about how it will affect our old sound data. This week we added opcodes, which will change our songs and sound effects terminate. Before we were terminating them with $FF. This won't work anymore because $FF doesn't do anything. For songs, we should terminate with "loop" followed by an address to loop to. With sound effects we should terminate with the opcode "endsound". See the included songs and sound effects for examples.
RTS Tables
We talked about jump tables and indirect jumping this week. Another method for doing the same thing involves something called an RTS table and the RTS Trick. I won't cover it in these tutorials, but if you are curious to know how this works you can read this nesdev wiki article I wrote about the RTS Trick.
Putting It All Together
Download and unzip the sample files. Make sure the following files are in the same folder as NESASM3:
Double click opcodes.bat. That will run NESASM3 and should produce the opcodes.nes file. Run that NES file in FCEUXD SP.
Use the controller to select songs and play them. Controls are as follows:
Up: Play
Down: Stop
Right: Next Song/SFX
Left: Previous Song/SFX
Song0 is a silence song. Not selectable.
Song1 is a boss song from The Guardian Legend. Now it loops!
Song2 is the same short sound effect from last week. Terminated with endsound.
Song3 is a song from Dragon Warrior. Now it loops!
Song4 is the same song4 as last week, but now it loops!
Song5 is a short sound effect, terminated with the endsound opcode.
Song6 should be familiar to readers of this forum. Do you recognize it? It utilizes opcodes for changing duty cycles and volume envelopes. Plus it loops!
Try adding your own songs and sound effects in. Try to add your own opcodes too. Here's some ideas for opcodes:
1. Trigger a sound effect mid-song
2. Implement duty cycle envelopes (similar to volume envelopes). Then make an opcode that allows you to change it.
3. Finite loops
Next Week: more opcode fun. Finite Loops, Changing Keys and Autom... .
Nerdy Nights Sound: Part 9: Finite Loops, Key Changes, Chord Progressions
Last Week: Opcodes and Looping
This Week: More opcodes: Finite Loops, Key Changes, Chord Progressions
Last week we learned how to use opcodes. Opcodes allow a song's streams to call a subroutine mid-play. This is a very powerful tool. We learned some of the most common opcodes: infinite loop (really a jump), change volume envelopes and change duty cycles. Today we are going to expand on opcodes and learn some cool opcode tricks that can save us a lot (!) of bytes and time.
Finite Looping
Last week we added the infinite loop opcode, which was really just an unconditional jump back to an earlier part of the song. Today we're going to add a finite loop opcode. A finite loop opcode tells the sound engine to repeat a particular section of a song X times, where X is some number defined by you. In the Battle Kid theme song I added last week there is a passage that looks like this:
.byte sixteenth
.byte A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4
.byte A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4
.byte A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4
.byte A3, C4, E4, A4, A3, E4, E3, E2
This is really just the same 4 notes repeated over and over again. Wouldn't it be cooler if we could do something like this instead:
.byte sixteenth
.byte A3, C4, E4, A4
.byte loop_13_times_please
.byte A3, E4, E3, E2
That saves a lot of bytes. We go from 56 bytes all the way down to around 10. The Battle Kid song actually plays this same phrase on both square channels, so really we go from 100+ bytes down to 20 or so. That's a big deal! If we consider how common repetitions of 4 or 8 occur in music, we can easily see that having a finite loop opcode could potential save us hundreds if not thousands of bytes in our sound data.
Finite Looping?
So what is a finite loop really? We saw that with an infinite loop it was really more like an unconditional jump. When the sound engine hits the infinite loop opcode, it jumps back, always, no matter what, no questions asked. A finite loop on the other hand is a conditional jump. It checks a counter. If the counter isn't 0 it jumps. If it is 0, it doesn't jump.
Loop Counter
First things first we need a loop counter. Each stream will have the ability to loop, so each stream will need its own loop counter:
stream_loop1 .rs 6 ;loop counter variable (one for each stream)
We will want to initialize this to 0 in our sound_load code:
lda #$00
sta stream_loop1, x
Next we will need a way to set this counter to some value. Some games bundle this up together in the finite loop opcode, but I prefer to make it its own opcode:
;this is our JUMP TABLE!
.word se_op_endsound ;$A0
.word se_op_infinite_loop ;$A1
.word se_op_change_ve ;$A2
.word se_op_duty ;$A3
.word se_op_set_loop1_counter ;$A4
;etc, one entry per subroutine
;these are aliases to use in the sound data.
endsound = $A0
loop = $A1
volume_envelope = $A2
duty = $A3
set_loop1_counter = $A4
lda [sound_ptr], y ;read the argument (# times to loop)
sta stream_loop1, x ;store it in the loop counter variable
Now we have an easy way to set the loop counter any time we want, like this:
;somewhere in sound data:
.byte set_loop1_counter, $04 ;repeat 4 times
Looping With The Counter
Our finite loop opcode will work like the infinite loop opcode, with two changes:
1) it will decrement the loop counter
2) it will check the result and only jump on a non-zero result
Let's write it:
;this is our JUMP TABLE!
.word se_op_endsound ;$A0
.word se_op_infinite_loop ;$A1
.word se_op_change_ve ;$A2
.word se_op_duty ;$A3
.word se_op_set_loop1_counter ;$A4
.word se_op_loop1 ;$A5
;etc, one entry per subroutine
;these are aliases to use in the sound data.
endsound = $A0
loop = $A1
volume_envelope = $A2
duty = $A3
set_loop1_counter = $A4
loop1 = $A5
dec stream_loop1, x ;decrement the counter
lda stream_loop1, x ;and check it
beq .last_iteration ;if zero, we are done looping
lda [sound_ptr], y ;read ptr LO from the data stream
sta stream_ptr_LO, x ;update our data stream position
lda [sound_ptr], y ;read ptr HI from the data stream
sta stream_ptr_HI, x ;update our data stream position
sta sound_ptr+1 ;update the pointer to reflect the new position.
lda stream_ptr_LO, x
sta sound_ptr
ldy #$FF ;after opcodes return, we do an iny. Since we reset
;the stream buffer position, we will want y to start out at 0 again.
iny ;skip the first byte of the address argument
; the second byte will be skipped automatically upon return
; (see se_fetch_byte. There is an "iny" after "jsr se_opcode_launcher")
Now we can loop. To use the Battle Kid example above, we go from this (56 bytes):
.byte A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4
.byte A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4
.byte A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4, A3, C4, E4, A4
.byte A3, C4, E4, A4, A3, E4, E3, E2
to this (13 bytes):
.byte set_loop1_counter, 13 ;repeat 13 times.
.intro_loop: ;make sure our loop point is AFTER we set the counter!
.byte A3, C4, E4, A4 ;the phrase to repeat.
.byte loop1 ;finite loop opcode
.word .intro_loop ;address to jump back to
.byte A3, E4, E3, E2 ;the last 4 notes
Pretty nice savings. Chances are we will be using this opcode set a lot.
We can save a few more bytes here. You may have noticed that the code in the .loop_back section of our finite loop opcode is identical to the infinite loop code:
lda [sound_ptr], y ;read ptr LO from the data stream
sta stream_ptr_LO, x ;update our data stream position
lda [sound_ptr], y ;read ptr HI from the data stream
sta stream_ptr_HI, x ;update our data stream position
sta sound_ptr+1 ;update the pointer to reflect the new position.
lda stream_ptr_LO, x
sta sound_ptr
ldy #$FF ;after opcodes return, we do an iny. Since we reset
;the stream buffer position, we will want y to start out at 0 again.
Compare with:
lda [sound_ptr], y ;read ptr LO from the data stream
sta stream_ptr_LO, x ;update our data stream position
lda [sound_ptr], y ;read ptr HI from the data stream
sta stream_ptr_HI, x ;update our data stream position
sta sound_ptr+1 ;update the pointer to reflect the new position.
lda stream_ptr_LO, x
sta sound_ptr
ldy #$FF ;after opcodes return, we do an iny. Since we reset
;the stream buffer position, we will want y to start out at 0 again.
Why have identical code in two places? Let's cut out the whole .loop_back section and replace it with a "jmp se_op_infinite_loop":
dec stream_loop1, x ;decrement the counter
lda stream_loop1, x ;check the counter
beq .last_iteration ;if zero, we are done looping
jmp se_op_infinite_loop ;if not zero, loop back
iny ;skip the first byte of the address argument
; the second byte will be skipped automatically upon return
; (see se_fetch_byte after "jsr se_opcode_launcher")
Multiple Finite Loops
You may have been wondering why I named the finite loop opcode "loop1". Why stick a 1 on the end there? This is because sometimes one finite loop opcode isn't enough. Consider the following song structure. Assume each letter represents a long series of notes:
With one finite loop opcode you could reduce it to this:
(A A A B C)x4
But if you had two finite loop opcodes available, you could nest them to reduce it even further:
(Ax3 B C)x4
If the music you write has a lot of patterns like this, it may be worth your while to have two or more finite loop opcodes available to you so that you can nest them. To add another finite loop opcode you need to:
1) declare another loop counter variable block in RAM (stream_loop2 .rs 6)
2) initialize the new loop counter to 0 in the sound_load routine.
3) add a new opcode for setting the new loop counter (se_op_set_loop2_counter)
4) add a new opcode to check the new counter and loop (se_op_loop2)
5) make sure to add the new opcodes to the jump table and give them an alias (set_loop2_counter, loop2).
Each finite loop opcode you add requires 6 bytes of RAM (a limited resource!), so please consider carefully if it is worth the tradeoff. It all depends on your music data.
Changing Keys
Another useful feature to have is the ability to change keys. Imagine you write a song and you have it all done. Then at the last minute you decide you want it to be in another key, say a step (2 notes) lower. Rather than rewrite the whole song by hand (it takes forever), wouldn't it be nice if there was an opcode that you could set to automatically subtract two from every note? What if you have a song pattern that gets played in more than one key (a rhythm track for a Blues song, for example)? We could save lots of bytes if we can figure out a way to write the pattern once, and then loop it while changing keys each iteration. Let's do it.
Note Offset
We will implement keys by having a note offset variable:
stream_note_offset .rs 6 ;note offset
The note offset is a value that gets added to the note value before pulling the period out of the note_table. We will initialize stream_note_offset to 0 so that the default behavior is to add 0 to the note (resulting in no change). However, if we set stream_note_offset to some value via an opcode, it will change the notes. Here is an updated se_fetch_byte that demonstrates how this works:
;do Note stuff
sty sound_temp1 ;save our index into the data stream
adc stream_note_offset, x ;add note offset
asl a
lda note_table, y
sta stream_note_LO, x
lda note_table+1, y
sta stream_note_HI, x
ldy sound_temp1 ;restore data stream index
Imagine what would happen if we have stream_note_offset set to 2. Say we read a C4 note from the data stream:
1. A C4 note is equivalent to hex value #$1b (see aliases in note_table.i)
2. we add stream_note_offset to this value. #$1b + #$02 = #$1d.
3. hex value #$1d is equivalent to a D4 note (see note_table.i)
4. wow, we raised the note up a step!
Using the same value for stream_note_offset, if we had a string of notes like this:
C4, E4, G4, B4, C5, E5, G5, E5, B5, C6 ;Cmaj7
it would get translated to:
D4, Fs4, A4, C#5, D5, Fs5, A5, C#6, D6 ;Dmaj7
Using stream_note_offset we can easily transpose entire sections of music into other keys. As mentioned above, we will initialize a stream's stream_note_offset to zero:
lda #$00
sta stream_note_offset, x
Set Note Offset
Now let's make an opcode that will set stream_note_offset to a specific value:
;this is our JUMP TABLE!
.word se_op_endsound ;$A0
.word se_op_infinite_loop ;$A1
.word se_op_change_ve ;$A2
.word se_op_duty ;$A3
.word se_op_set_loop1_counter ;$A4
.word se_op_loop1 ;$A5
.word se_op_set_note_offset ;$A6
;these are aliases to use in the sound data.
endsound = $A0
loop = $A1
volume_envelope = $A2
duty = $A3
set_loop1_counter = $A4
loop1 = $A5
set_note_offset = $A6
lda [sound_ptr], y ;read the argument
sta stream_note_offset, x ;set the note offset.
Now we can set the note offset anytime we want in the data stream:
;oops, after writing the song, I realized I wanted it to be in D instead. No problem.
.byte set_note_offset, 2
.byte C2, C3, C4, C5, ;etc.. more notes in the key of C.
Adjust Note Offset
Setting the note offset to a specific value has very limited application. It's like a one-time keychange. More often we will want to set the note offset to some relative value. For example, instead of setting stream_note_offset to 2, we might want to set stream_note_offset to "the current offset + 2". If we had an opcode that let us adjust stream_note_offset by a relative value, we could use it together with loops. First let's write the opcode:
;this is our JUMP TABLE!
.word se_op_endsound ;$A0
.word se_op_infinite_loop ;$A1
.word se_op_change_ve ;$A2
.word se_op_duty ;$A3
.word se_op_set_loop1_counter ;$A4
.word se_op_loop1 ;$A5
.word se_op_set_note_offset ;$A6
.word se_op_adjust_note_offset ;$A7
;these are aliases to use in the sound data.
endsound = $A0
loop = $A1
volume_envelope = $A2
duty = $A3
set_loop1_counter = $A4
loop1 = $A5
set_note_offset = $A6
adjust_note_offset = $A7
lda [sound_ptr], y ;read the argument (what value to add)
adc stream_note_offset, x ;add it to the current offset
sta stream_note_offset, x ;and save.
Let's look at this opcode in use. Say we have a long arpeggiated line like this:
C2, E2, G2, B2, C3, E3, G3, B3, C4, E4, G4, B4, C5, E5, G5, B5, C6, E6, G6, B6, C7 ;Cmaj7 (21 bytes)
This passage just repeats the same 4 notes (C E G B) over 5 octaves.
.byte set_loop1_counter, 5 ;loop 5 times
.byte C2, E2, G2, B2 ;these are the 4 notes to loop
.byte adjust_note_offset, 12 ;each iteration add 12 to the offset (ie, go up an octave)
.byte loop1
.word .loop
.byte C2 ;will be a C7. Cmaj7 (12 bytes)
The first time through the loop it will play C2, E2, G2, B2. The second time through the loop it will play C3, E3, G3, B3. The third time through will be C4, E4, G4, B4, etc. Using our opcodes, we reduce the size of our data from 21 bytes to 12 bytes. That's almost 50% savings.
Battle Kid
To take a better example, let's look at the bassline to the Battle Kid theme song. Last week, it looked like this:
.byte eighth
.byte A3, A3, A4, A4, A3, A3, A4, A4
.byte G3, G3, G4, G4, G3, G3, G4, G4 ;down a step (-2)
.byte F3, F3, F4, F4, F3, F3, F4, F4 ;down a step (-2)
.byte Eb3, Eb3, Eb4, Eb4, Eb3, Eb3, Eb4, Eb4 ;down a step (-2)
.byte loop
.word song6_tri
;36 bytes
We have a pattern here: X3, X3, X4, X4, X3, X3, X4, X4, where X = some note. It just so happens that each new X is just the previous X minus 2. Using our new opcode, we can rewrite the bassline like this:
.byte eighth
.byte set_loop1_counter, 4 ;repeat 4 times
.byte A3, A3, A4, A4, A3, A3, A4, A4 ;series of notes to repeat
.byte adjust_note_offset, -2 ;go down a step
.byte loop1
.word .loop
.byte set_note_offset, 0 ;after 4 repeats, reset note offset to 0.
.byte loop ;infinite loop
.word song6_tri
;21 bytes
We drop from 36 bytes to 21 bytes of ROM space. About 40% savings!
Loopy Sound Effects
We can produce some cool sound effects if we combine loops and key changes at high tempos. Look at this one (tempo is $FF):
.byte set_loop1_counter, $08 ;repeat 8 times
.byte thirtysecond, D7, D6, G6 ;play two D notes at different octaves and a G. Pretty random
.byte adjust_note_offset, -4 ;go down 2 steps
.byte loop1
.word .loop
.byte endsound
This sound effect plays a simple 3-note pattern in descending keys super fast. The sound data is only 12 bytes, but it produces a pretty complex sound effect. Listen to song7 in this week's sample files to hear it. By experimenting with loops like this we can come up with some sounds that would be difficult to compose by hand.
Complex Chord Progressions
We made some good savings percentage-wise on the bassline to Battle Kid. But we were lucky. The chord progression went down in consistent steps: -2, -2, -2. It was possible to loop this because we adjust the note_offset by the same value (-2) each time. But what if we had a pattern that was repeated in a more complicated way? We do. Let's look at the rhythm pattern for our Guardian Legend boss song:
.byte eighth
.byte A2, A2, A2, A3, A2, A3, A2, A3
.byte F3, F3, F3, F4, F3, F4, F3, F4 ;+8 (A2 + 8 = F3)
.byte A2, A2, A2, A3, A2, A3, A2, A3 ;-8
.byte F3, F3, F3, F4, F3, F4, F3, F4 ;+8
.byte E3, E3, E3, E4, E3, E4, E3, E4 ;-1
.byte E3, E3, E3, E4, E3, E4, E3, E4 ;+0
.byte Ds3, Ds3, Ds3, Ds4, Ds3, Ds4, Ds3, Ds4 ;-1
.byte D3, D3, D3, D4, D3, D4, D3, D4 ;-1
.byte C3, C3, C3, C4, C3, C4, C3, C4 ;-2
.byte B2, B2, B2, B3, B2, B3, B2, B3 ;-1
.byte As2, As2, As2, As3, As2, As3, As2, As3 ;-1
.byte A2, A2, A2, A3, A2, A3, A2, A3 ;-1
.byte Gs2, Gs2, Gs2, Gs3, Gs2, Gs3, Gs2, Gs3 ;-1
.byte G2, G2, G2, G3, G2, G3, G2, G3 ;-1
.byte loop ;+2 (loop back to A2)
.word song1_square1
Here we have another pattern: Xi, Xi, Xi, Xi+1, Xi, Xi+1, Xi, Xi+1, where X = some note and i = some octave. Cool. A pattern means we have an opportunity to save bytes by looping. But wait. Unlike Battle Kid, this pattern jumps around in an inconsistent way. What should we do?
Super TGL Transposition Trick
I learned this trick from The Guardian Legend, so I call it the TGL Transposition Trick. What we do is we loop the pattern, and then use the loop counter as an index into a lookup table. The lookup table contains note offset values. Because the loop counter decrements, our lookup table will be sequentially backwards.
Wait, what? Let's looks at our example:
.byte eighth
.byte set_loop1_counter, 14 ;repeat 14 times
.byte A2, A2, A2, A3, A2, A3, A2, A3
;pull a value from lookup_table and
; add it to stream_note_offset
.byte loop1 ;finite loop (14 times)
.word .loop
.byte loop ;infinite loop
.word song1_square1
.byte 2, -1, -1, -1, -1, -1, -2
.byte -1, -1, 0, -1, 8, -8, 8 ;14 entries long, reverse order
I'm going to break it down in a second here, but first let me tell you that the part highlighted in red above will be covered by a single opcode, transpose. The transpose opcode takes a 2-byte argument, so altogether that commented section will be replaced with 3 bytes of data. So if we count up all of the bytes in our rhythm sound data we get 34 bytes. The original was 116 bytes. By using the TGL Transposition Trick, we save 82 bytes. That's 70%!
.byte eighth
.byte set_loop1_counter, 14 ;repeat 14 times
.byte A2, A2, A2, A3, A2, A3, A2, A3
.byte transpose ;the transpose opcode take a 2-byte argument
.word .lookup_table ;which is the address of the lookup table
.byte loop1 ;finite loop (14 times)
.word .loop
.byte loop ;infinite loop
.word song1_square1
.byte 2, -1, -1, -1, -1, -1, -2
.byte -1, -1, 0, -1, 8, -8, 8 ;14 entries long, reverse order
;*** altogether 34 bytes ***
The transpose opcode will set up a pointer variable to point to the lookup table. Then it will take the loop counter, subtract 1, and use the result as an index into the table. We subtract 1 because the tables index from zero. If we loop 14 times, our table will have 14 entries numbered 0-13. Once the transpose opcode has its index, it will pull a value from the table. This value will be added to stream_note_offset.
Before we write the opcode, let's trace through the data to see how it works. We'll start at the very first byte of song1_square1:
1) set note length to eighth notes
2) set the loop counter to 14
(.loop iteration 1)
3) play a series of notes: A2, A2, A2, A3, A2, A3, A2, A3
4) transpose opcode. Setup a pointer to lookup_table. Use our loop counter, minus one, as an index. The loop counter is 14 now, so we will pull out .lookup_table+13, which is an 8. Add 8 to the current stream_note_offset: 0 + 8 = 8.
5) decrement the loop counter (14->13) and loop back to the .loop label
(iteration 2)
6) our new string of notes with the +8: F3, F3, F3, F4, F3, F4, F3, F4.
7) transpose opcode. Loop counter is 13. Grab .lookup_table+12, which is -8. Add -8 to stream_note_offset: 8 + -8 = 0.
8) decrement loop counter (13->12) and loop back to .loop label
(iteration 3)
9) our new string of notes with the +0: A2, A2, A2, A3, A2, A3, A2, A3
10) transpose opcode. Loop counter is 12. Grab .lookup_table+11, which is 8. Add 8 to stream_note_offset: 0 + 8 = 8.
11) decrement loop counter (12->11) and loop back to .loop label
(iteration 4)
12) our new string of notes with the +8: F3, F3, F3, F4, F3, F4, F3, F4.
13) transpose opcode. Loop counter is 11. Grab .lookup_table+10, which is -1. Add -1 to stream_note_offset: 8 + -1 = 7.
14) decrement loop counter (11->10) and loop back to .loop label
(iteration 4)
15) our new string of notes with the +7: E3, E3, E3, E4, E3, E4, E3, E4.
16) transpose opcode. Loop counter is 10. Grab .lookup_table+9, which is 0. Add 0 to stream_note_offset: 7 + 0 = 7.
17) decrement loop counter (10->9) and loop back to .loop label
etc. On the last iteration our loop counter is 1. We grab .lookup_table+0 and add it to stream_note_offset. Then we decrement the loop counter (1->0). Our loop counter is now 0, so our loop breaks. Pretty cool, no? Let's write it.
;this is our JUMP TABLE!
.word se_op_endsound ;$A0
.word se_op_infinite_loop ;$A1
.word se_op_change_ve ;$A2
.word se_op_duty ;$A3
.word se_op_set_loop1_counter ;$A4
.word se_op_loop1 ;$A5
.word se_op_set_note_offset ;$A6
.word se_op_adjust_note_offset ;$A7
.word se_op_transpose ;$A8
;these are aliases to use in the sound data.
endsound = $A0
loop = $A1
volume_envelope = $A2
duty = $A3
set_loop1_counter = $A4
loop1 = $A5
set_note_offset = $A6
adjust_note_offset = $A7
transpose = $A8
lda [sound_ptr], y ;read low byte of the pointer to our lookup table
sta sound_ptr2 ;store it in a new pointer variable
lda [sound_ptr], y ;read high byte of pointer to table
sta sound_ptr2+1
sty sound_temp ;save y because we are about to destroy it
lda stream_loop1, x ;get loop counter, put it in Y
tay ; this will be our index into the lookup table
dey ;subtract 1 because indexes start from 0.
lda [sound_ptr2], y ;read a value from the table.
adc stream_note_offset, x ;add it to the note offset
sta stream_note_offset, x
ldy sound_temp ;restore Y
There is a new pointer variable here, sound_ptr2. Actually, what I really did was rename jmp_ptr to sound_ptr2. The new name let's me know it's for sound engine use only. Since we finish with jmp_ptr as soon as we jump, there are no pointer conflicts here.
This is just an example of how clever use of opcodes and looping can save you lots of bytes. Keep in mind that this transpose opcode is only useful if you write music that has repeating patterns in the rhythm section. If you don't, then save yourself some bytes and cut the opcode from your sound engine.
Putting It All Together
Download and unzip the sample files. Make sure the following files are in the same folder as NESASM3:
Double click opcodes2.bat. That will run NESASM3 and should produce the opcodes2.nes file. Run that NES file in FCEUXD SP.
Use the controller to select songs and play them. Controls are as follows:
Up: Play
Down: Stop
Right : Next Song/SFX
Left : Previous Song/SFX
Song0 is a silence song. Not selectable.
Song1-Song6 are the same as last week, but they take up less ROM-space now
Song7 is a new sound effect created by looping a key change at high tempo.
As usual, try adding your own songs and sound effects in using the new opcodes. Experiment.
Next Week: Noise, Simple Drums
Main Tutorial Series
Nerdy Nights intro
Nerdy Nights week 1: number systems and core programming ideas
Nerdy Nights week 2: NES architecture overview
Nerdy Nights week 3: 6502 ASM, first app
Nerdy Nights week 4: Color Palettes, Sprites, second app
Nerdy Nights week 5: multiple sprites, reading controllers, more instructions
Nerdy Nights week 6: Backgrounds
Nerdy Nights week 7: subroutines, game layout, starting Pong
Nerdy Nights week 8: 16 bit math, pointers, nested loops
Nerdy Nights week 9: Numbers, Bin to Dec
Nerdy Nights intro
This is going to be a series of weekly NES programming lessons, starting from absolutely no knowledge. Right now the plan is 16-20 lessons ending with a complete game like pong or breakout. The first lessons may be easy but they will get harder! People who have done any type of programming before may have an easier time but anyone should be able to make it through. All the tools will be Windows based, so Linux users will have to use wine and MacOS users will have to use Parallels, Boot Camp, or VirtualPC.
Many things will simply not be covered in this series. No audio or scrolling will be done. Only the NROM mapper will be used. After you make it through all the lessons those will be much more simple than when you are first learning.
And finally these will not be without errors to begin with and may not be the absolute best way to do anything. People develop their own programming styles and this is just mine, which tends to be quickly written and not super efficient.
Many things will simply not be covered in this series. No audio or scrolling will be done. Only the NROM mapper will be used. After you make it through all the lessons those will be much more simple than when you are first learning.
And finally these will not be without errors to begin with and may not be the absolute best way to do anything. People develop their own programming styles and this is just mine, which tends to be quickly written and not super efficient.
Nerdy Nights week 1: number systems and core programming ideas
Number Systems
The decimal system is base 10. Every digit can be 0-9. Each digit place is a power of 10. Each digit place to the left is 10 times more than the previous digit place. If you take the number 10 and put a 0 to the right, it becomes 100 which is 10 times more. Remove the 0 from the right, it becomes 1 which is 10 times less.
To get the value of a number, you multiply each digit by it's place value and add them all together.
Everything in computers is done in base 2, binary. This is because the lowest level of computing is a switch; on/off, 1/0.
Base 2 binary works the same way, except each digit can be 0-1 and the place values are powers of 2 instead of 10. Insert a 0 to the right of a number and it becomes 2 times bigger. Remove a 0 and it becomes 2 times smaller.
The NES is an 8 bit system, which means the binary number it works with are 8 binary digits long. 8 bits is one byte. Some examples are:
Eventually you become fast at reading binary numbers, or at least recognizing patterns. You can see that one byte can only range from 0-255. For numbers bigger than that you must use 2 or more bytes. There are also no negative numbers. More on that later.
Hexadecimal or Hex is base 16, so each digit is 0-15 and each digit place is a power of 16. The problem is anything 10 and above needs 2 digits. To fix this letters are used instead of numbers starting with A:
As with decimal and hex the digit places are each a power of 16:
Hex is largely used because it is much faster to write than binary. An 8 digit binary number turns into a 2 digit hex number:
And more examples:
For easy converting open up the built in Windows calculator and switch it to scientific mode. Choose the base (Hex, Dec, or Bin), type the number, then switch to another base.
When the numbers are written an extra character is added so you can tell which base is being used. Binary is typically prefixed with a %, like %00001111. Hex is prefixed with a $ like $2A. Some other conventions are postfixing binary with a b like 00001111b and postfixing hex with an h like 2Ah.
The NES has a 16 bit address bus (more on that later), so it can access 2^16 bytes of memory. 16 binary digits turns into 4 hex digits, so typical NES addresses look like $8000, $FFFF, and $4017.
Core Programming Concepts
All programming languages have three basic concepts. They are instructions, variables, and control flow. If any of those three are missing it is no longer a true programming language. For example HTML has no control flow so it is not a programming language.
An instruction is the smallest command that the processor runs. Instructions are run one at a time, one after another. In the NES processor there are only 56 instructions. Typically around 10 of those will be used constantly, and at least 10 will be completely ignored. Some examples of these would be addition, loading a number, or comparing a variable to zero.
A variable is a place that stores data that can be modified. An example of this would be the vertical position of Mario on the screen. It can be changed any time during the game. Variables in source code all have names you set, so it would be something like MarioHorizPosition.
Control Flow
Normally your instructions run in sequential order. Sometimes you will want to run a different section of code depending on a variable. This would be a control flow statement which changes the normal flow of your program. An example would be if Mario is falling, jump to the code that checks if he hit the ground yet.
NEXT WEEK: basic NES architecture
The decimal system is base 10. Every digit can be 0-9. Each digit place is a power of 10. Each digit place to the left is 10 times more than the previous digit place. If you take the number 10 and put a 0 to the right, it becomes 100 which is 10 times more. Remove the 0 from the right, it becomes 1 which is 10 times less.
100's place 10's place 1's place 0 0 1 = 001 0 1 0 = 010 1 0 0 = 100
To get the value of a number, you multiply each digit by it's place value and add them all together.
100's place 10's place 1's place 3 8 0 = 3*100 + 8*10 + 0*1 = 380 0 4 1 = 0*100 + 4*10 + 1*1 = 41
Everything in computers is done in base 2, binary. This is because the lowest level of computing is a switch; on/off, 1/0.
Base 2 binary works the same way, except each digit can be 0-1 and the place values are powers of 2 instead of 10. Insert a 0 to the right of a number and it becomes 2 times bigger. Remove a 0 and it becomes 2 times smaller.
8's place 4's place 2's place 1's place 0 1 0 0 = 0*8 + 1*4 + 0*2 + 0*1 = 4 1 1 1 1 = 1*8 + 1*4 + 1*2 + 1*1 = 15
The NES is an 8 bit system, which means the binary number it works with are 8 binary digits long. 8 bits is one byte. Some examples are:
Binary Decimal 00000000 = 0 00001111 = 15 00010000 = 16 10101010 = 170 11111111 = 255
Eventually you become fast at reading binary numbers, or at least recognizing patterns. You can see that one byte can only range from 0-255. For numbers bigger than that you must use 2 or more bytes. There are also no negative numbers. More on that later.
Hexadecimal or Hex is base 16, so each digit is 0-15 and each digit place is a power of 16. The problem is anything 10 and above needs 2 digits. To fix this letters are used instead of numbers starting with A:
Decimal Hex 0 = 0 1 = 1 9 = 9 10 = A 11 = B 12 = C 13 = D 14 = E 15 = F
As with decimal and hex the digit places are each a power of 16:
16's place 1's place 6 A = 6*16 + A(10)*1 = 106 1 0 = 1*16 + 0*1 = 16
Hex is largely used because it is much faster to write than binary. An 8 digit binary number turns into a 2 digit hex number:
Binary 01101010 split | | in half / \ 0110 1010 into | | hex 6 A | | put \ / back 6A
01101010 = 6A
And more examples:
Binary Hex Decimal 00000000 = 00 = 0 00001111 = 0F = 15 00010000 = 10 = 16 10101010 = AA = 170 11111111 = FF = 255
For easy converting open up the built in Windows calculator and switch it to scientific mode. Choose the base (Hex, Dec, or Bin), type the number, then switch to another base.
When the numbers are written an extra character is added so you can tell which base is being used. Binary is typically prefixed with a %, like %00001111. Hex is prefixed with a $ like $2A. Some other conventions are postfixing binary with a b like 00001111b and postfixing hex with an h like 2Ah.
The NES has a 16 bit address bus (more on that later), so it can access 2^16 bytes of memory. 16 binary digits turns into 4 hex digits, so typical NES addresses look like $8000, $FFFF, and $4017.
Core Programming Concepts
All programming languages have three basic concepts. They are instructions, variables, and control flow. If any of those three are missing it is no longer a true programming language. For example HTML has no control flow so it is not a programming language.
An instruction is the smallest command that the processor runs. Instructions are run one at a time, one after another. In the NES processor there are only 56 instructions. Typically around 10 of those will be used constantly, and at least 10 will be completely ignored. Some examples of these would be addition, loading a number, or comparing a variable to zero.
A variable is a place that stores data that can be modified. An example of this would be the vertical position of Mario on the screen. It can be changed any time during the game. Variables in source code all have names you set, so it would be something like MarioHorizPosition.
Control Flow
Normally your instructions run in sequential order. Sometimes you will want to run a different section of code depending on a variable. This would be a control flow statement which changes the normal flow of your program. An example would be if Mario is falling, jump to the code that checks if he hit the ground yet.
NEXT WEEK: basic NES architecture
Nerdy Nights week 2: NES architecture overview
Previous week: number systems and core programming ideas
This week: general overview of the NES architecture with the major components covered. All general purpose computers are arranged the same way with a place to store code (ROM), a place to store variables (RAM), and a processor to run code (CPU). The NES also adds another processor to generate the graphics (PPU) and a section of the CPU to generate audio (APU). Everything here is very general and will have more details than you want in the next few weeks.
NES System Architecture
KB - Memory size is listed in KiloBytes or KB. 1KB = 1024 bytes. Everything is powers of 2, so 2^10 = 1024 is used instead of 1000. If the capitalization is different, the meaning can change. Kb is Kilobits. Divide Kb by 8 to get KB, because 1 byte = 8 bits.
ROM - Read Only Memory, holds data that cannot be changed. This is where the game code or graphics is stored on the cart.
RAM - Random Access Memory, holds data that can be read and written. When power is removed, the chip is erased. A battery can be used to keep power and data valid.
PRG - Program memory, the code for the game
CHR - Character memory, the data for graphics
CPU - Central Processing Unit, the main processor chip
PPU - Picture Processing Unit, the graphics chip
APU - Audio Processing Unit, the sound chip inside the CPU
System Overview
The NES includes a custom 6502 based CPU with the APU and controller handling inside one chip, and a PPU that displays graphics in another chip. Your code instructions run on the CPU and sends out commands to the APU and PPU. The NOAC (NES On A Chip) clones like the Yobo and NEX put all of these parts onto one chip.
There is only 2KB of RAM connected to the CPU for storing variables, and 2KB of RAM connected to the PPU for holding two TV screens of background graphics. Some carts add extra CPU RAM, called Work RAM or WRAM. If a cart needs to store saved games, this WRAM will have a battery attached to make sure it isn't erased. A few carts add extra PPU RAM to hold four screens of background graphics at once. This is not common. The rest of this tutorial will not use WRAM or four screen RAM.
Each cart includes at least three chips. One holds the program code (PRG), another holds the character graphics (CHR), and the last is the lockout. The graphics chip can be RAM instead of ROM, which means the game code would copy graphics from the PRG ROM chip to the CHR RAM. PRG is always a ROM chip.
Lockout Chip
Inside the NES and the cart are also two lockout chips. The lockout chip controls resetting the console. First the NES lockout sends out a stream ID, 0-15. The cart lockout records this number. Then both lockout chips run a complex equation using that number and send the results to each other. Both chips know what the other is supposed to send so they both know when something is wrong. If that happens the system enters the continuous reseting loop. This is the screen flashing you see with a dirty cart.
When you cut pin 4 of the NES lockout chip, you are making it think it is inside the cart. It sits there waiting for the ID from the NES which never happens, so the system is never reset. If you were to completely remove the NES lockout chip the system would not work because it controls the reset button.
Most lockout defeaters used by the unlicensed game companies used large voltage spikes sent from the cart to the NES lockout. When timed right those would crash the NES lockout, preventing it from resetting the system. Nintendo slowly added protection against those on the NES board. Next time you open your NES, check the board for the revision number. Right in the middle it will say NES-CPU- then a number. That number is the revision. If you have 05 it is an early one. 07 and 09 added some lockout protection. 11 was the last version with the most lockout protection. Almost all unlicensed carts that use lockout defeaters will not work on a NES-CPU-11 system.
CPU Overview
The NES CPU is a modified 6502, an 8 bit data processor similar to the Apple 2, Atari 2600, C64, and many other systems. By the time the Famicom was created it was underpowered for a computer but great for a game system.
The CPU has a 16 bit address bus which can access up to 64KB of memory. 2^16 = 65536, or 64KB. Included in that memory space is the 2KB of CPU RAM, ports to access PPU/APU/controllers, WRAM (if on the cart), and 32KB for PRG ROM. The 16 bit addresses are written in hex, so they become 4 digits starting with a $ symbol. For example the internal RAM is at $0000-0800. $0800 = 2048 or 2KB. 32KB quickly became too small for games, which is why memory mappers were used. Those mappers can swap in different banks of PRG code or CHR graphics. Mappers like the MMC3 allowed up to 512KB of PRG, and 256KB of CHR. There is no limit to the memory size if you create a new mapper chip, but 128KB PRG and 64KB CHR was the most common size.
PPU Overview
The NES PPU is a custom chip that does all the graphics display. It includes internal RAM for sprites and the color palette. There is RAM on the NES board that holds the background, and all actual graphics are fetched from the cart CHR memory.
Your program does not run on the PPU, the PPU always goes through the same display order. You only set some options like colors and scrolling. The PPU processes one TV scanline at a time. First the sprites are fetched from the cart CHR memory. If there are more than 8 sprites on the scanline the rest are ignored. This is why some games like Super Dodge Ball will blink when there is lots happening on screen. After the sprites the background is fetched from CHR memory. When all the scanlines are done there is a period when no graphics are sent out. This is called VBlank and is the only time graphics updates can be done. PAL has a longer VBlank time (when the TV cathode ray gun is going back to the top of the screen) which allows more time for graphics updates. Some PAL games and demos do not run on NTSC systems because of this difference in VBlank time. Both the NTSC and PAL systems have a resolution of 256x240 pixels, but the top and bottom 8 rows are typically cut off by the NTSC TV resulting in 256x224. TV variations will cut off an additional 0-8 rows, so you should allow for a border before drawing important information.
NTSC runs at 60Hz and PAL runs at 50Hz. Running an NTSC game on a PAL system will be slower because of this timing difference. Sounds will also be slower.
Graphics System Overview
All graphics are made up of 8x8 pixel tiles. Large characters like Mario are made from multiple 8x8 tiles. All the backgrounds are also made from these tiles. The tile system means less memory is needed (was expensive at the time) but also means that things like bitmap pictures and 3d graphics aren't really possible. To see all the tiles in a game, download Tile Molester and open up your .NES file. Scroll down until you see graphics that don't look like static. You can see that small tiles are arranged by the game to make large images.
The PPU has enough memory for 64 sprites, or things that move around on screen like Mario. Only 8 sprites per scanline are allowed, any more than that will be ignored. This is where the flickering comes from in some games when there are too many objects on screen.
This is the landscape graphics, which scrolls all at once. The sprites can either be displayed in front or behind the background. The screen is big enough for 32x30 background tiles, and there is enough internal RAM to hold 2 screens. When games scroll the background graphics are updated off screen before they are scrolled on screen.
Pattern Tables
These are where the actual tile data is stored. It is either ROM or RAM on the cart. Each pattern table holds 256 tiles. One table is used for backgrounds, and the other for sprites. All graphics currently on screen must be in these tables.
Attribute Tables
These tables set the color information in 2x2 tile sections. This means that a 16x16 pixel area can only have 4 different colors selected from the palette.
These two areas hold the color information, one for the background and one for sprites. Each palette has 16 colors.
To display a tile on screen, the pixel color index is taken from the Pattern Table and the Attribute Table. That index is then looked up in the Palette to get the actual color.
To see all the graphics, download the FCEUXD SP emulator. Open up your .NES game and choose PPU Viewer from the Tools menu. This will show you all the active background tiles, all the active sprite tiles, and the color palettes. Then choose Name Table Viewer from the Tools menu. This will show you the backgrounds as they will appear on screen. If you choose a game that scrolls like SMB you can see the off screen background sections being updated.
NEXT WEEK: CPU details, start of 6502 assembly programming
This week: general overview of the NES architecture with the major components covered. All general purpose computers are arranged the same way with a place to store code (ROM), a place to store variables (RAM), and a processor to run code (CPU). The NES also adds another processor to generate the graphics (PPU) and a section of the CPU to generate audio (APU). Everything here is very general and will have more details than you want in the next few weeks.
NES System Architecture
KB - Memory size is listed in KiloBytes or KB. 1KB = 1024 bytes. Everything is powers of 2, so 2^10 = 1024 is used instead of 1000. If the capitalization is different, the meaning can change. Kb is Kilobits. Divide Kb by 8 to get KB, because 1 byte = 8 bits.
ROM - Read Only Memory, holds data that cannot be changed. This is where the game code or graphics is stored on the cart.
RAM - Random Access Memory, holds data that can be read and written. When power is removed, the chip is erased. A battery can be used to keep power and data valid.
PRG - Program memory, the code for the game
CHR - Character memory, the data for graphics
CPU - Central Processing Unit, the main processor chip
PPU - Picture Processing Unit, the graphics chip
APU - Audio Processing Unit, the sound chip inside the CPU
System Overview
The NES includes a custom 6502 based CPU with the APU and controller handling inside one chip, and a PPU that displays graphics in another chip. Your code instructions run on the CPU and sends out commands to the APU and PPU. The NOAC (NES On A Chip) clones like the Yobo and NEX put all of these parts onto one chip.
There is only 2KB of RAM connected to the CPU for storing variables, and 2KB of RAM connected to the PPU for holding two TV screens of background graphics. Some carts add extra CPU RAM, called Work RAM or WRAM. If a cart needs to store saved games, this WRAM will have a battery attached to make sure it isn't erased. A few carts add extra PPU RAM to hold four screens of background graphics at once. This is not common. The rest of this tutorial will not use WRAM or four screen RAM.
Each cart includes at least three chips. One holds the program code (PRG), another holds the character graphics (CHR), and the last is the lockout. The graphics chip can be RAM instead of ROM, which means the game code would copy graphics from the PRG ROM chip to the CHR RAM. PRG is always a ROM chip.
Lockout Chip
Inside the NES and the cart are also two lockout chips. The lockout chip controls resetting the console. First the NES lockout sends out a stream ID, 0-15. The cart lockout records this number. Then both lockout chips run a complex equation using that number and send the results to each other. Both chips know what the other is supposed to send so they both know when something is wrong. If that happens the system enters the continuous reseting loop. This is the screen flashing you see with a dirty cart.
When you cut pin 4 of the NES lockout chip, you are making it think it is inside the cart. It sits there waiting for the ID from the NES which never happens, so the system is never reset. If you were to completely remove the NES lockout chip the system would not work because it controls the reset button.
Most lockout defeaters used by the unlicensed game companies used large voltage spikes sent from the cart to the NES lockout. When timed right those would crash the NES lockout, preventing it from resetting the system. Nintendo slowly added protection against those on the NES board. Next time you open your NES, check the board for the revision number. Right in the middle it will say NES-CPU- then a number. That number is the revision. If you have 05 it is an early one. 07 and 09 added some lockout protection. 11 was the last version with the most lockout protection. Almost all unlicensed carts that use lockout defeaters will not work on a NES-CPU-11 system.
CPU Overview
The NES CPU is a modified 6502, an 8 bit data processor similar to the Apple 2, Atari 2600, C64, and many other systems. By the time the Famicom was created it was underpowered for a computer but great for a game system.
The CPU has a 16 bit address bus which can access up to 64KB of memory. 2^16 = 65536, or 64KB. Included in that memory space is the 2KB of CPU RAM, ports to access PPU/APU/controllers, WRAM (if on the cart), and 32KB for PRG ROM. The 16 bit addresses are written in hex, so they become 4 digits starting with a $ symbol. For example the internal RAM is at $0000-0800. $0800 = 2048 or 2KB. 32KB quickly became too small for games, which is why memory mappers were used. Those mappers can swap in different banks of PRG code or CHR graphics. Mappers like the MMC3 allowed up to 512KB of PRG, and 256KB of CHR. There is no limit to the memory size if you create a new mapper chip, but 128KB PRG and 64KB CHR was the most common size.
PPU Overview
The NES PPU is a custom chip that does all the graphics display. It includes internal RAM for sprites and the color palette. There is RAM on the NES board that holds the background, and all actual graphics are fetched from the cart CHR memory.
Your program does not run on the PPU, the PPU always goes through the same display order. You only set some options like colors and scrolling. The PPU processes one TV scanline at a time. First the sprites are fetched from the cart CHR memory. If there are more than 8 sprites on the scanline the rest are ignored. This is why some games like Super Dodge Ball will blink when there is lots happening on screen. After the sprites the background is fetched from CHR memory. When all the scanlines are done there is a period when no graphics are sent out. This is called VBlank and is the only time graphics updates can be done. PAL has a longer VBlank time (when the TV cathode ray gun is going back to the top of the screen) which allows more time for graphics updates. Some PAL games and demos do not run on NTSC systems because of this difference in VBlank time. Both the NTSC and PAL systems have a resolution of 256x240 pixels, but the top and bottom 8 rows are typically cut off by the NTSC TV resulting in 256x224. TV variations will cut off an additional 0-8 rows, so you should allow for a border before drawing important information.
NTSC runs at 60Hz and PAL runs at 50Hz. Running an NTSC game on a PAL system will be slower because of this timing difference. Sounds will also be slower.
Graphics System Overview
All graphics are made up of 8x8 pixel tiles. Large characters like Mario are made from multiple 8x8 tiles. All the backgrounds are also made from these tiles. The tile system means less memory is needed (was expensive at the time) but also means that things like bitmap pictures and 3d graphics aren't really possible. To see all the tiles in a game, download Tile Molester and open up your .NES file. Scroll down until you see graphics that don't look like static. You can see that small tiles are arranged by the game to make large images.
The PPU has enough memory for 64 sprites, or things that move around on screen like Mario. Only 8 sprites per scanline are allowed, any more than that will be ignored. This is where the flickering comes from in some games when there are too many objects on screen.
This is the landscape graphics, which scrolls all at once. The sprites can either be displayed in front or behind the background. The screen is big enough for 32x30 background tiles, and there is enough internal RAM to hold 2 screens. When games scroll the background graphics are updated off screen before they are scrolled on screen.
Pattern Tables
These are where the actual tile data is stored. It is either ROM or RAM on the cart. Each pattern table holds 256 tiles. One table is used for backgrounds, and the other for sprites. All graphics currently on screen must be in these tables.
Attribute Tables
These tables set the color information in 2x2 tile sections. This means that a 16x16 pixel area can only have 4 different colors selected from the palette.
These two areas hold the color information, one for the background and one for sprites. Each palette has 16 colors.
To display a tile on screen, the pixel color index is taken from the Pattern Table and the Attribute Table. That index is then looked up in the Palette to get the actual color.
To see all the graphics, download the FCEUXD SP emulator. Open up your .NES game and choose PPU Viewer from the Tools menu. This will show you all the active background tiles, all the active sprite tiles, and the color palettes. Then choose Name Table Viewer from the Tools menu. This will show you the backgrounds as they will appear on screen. If you choose a game that scrolls like SMB you can see the off screen background sections being updated.
NEXT WEEK: CPU details, start of 6502 assembly programming
Nerdy Nights week 3: 6502 ASM, first app
Previous Week - NES architecture overview
This Week: starts getting into more details about the 6502 and intro to assembly language. The lessons for asm usage and NES specifics will be done in sections together. There are many other 6502 websites and good books which may help you learn better.
6502 Assembly
Bit - The smallest unit in computers. It is either a 1 (on) or a 0 (off), like a light switch.
Byte - 8 bits together form one byte, a number from 0 to 255. Two bytes put together is 16 bits, forming a number from 0 to 65535. Bits in the byte are numbered starting from the right at 0.
Instruction - one command a processor executes. Instructions are run sequentially.
Code Layout
In assembly language there are 5 main parts. Some parts must be in a specific horizontal position for the assembler to use them correctly.
Directives are commands you send to the assembler to do things like locating code in memory. They start with a . and are indented. Some people use tabs, or 4 spaces, and I use 2 spaces. This sample directive tells the assembler to put the code starting at memory location $8000, which is inside the game ROM area:
The label is aligned to the far left and has a : at the end. The label is just something you use to organize your code and make it easier to read. The assembler translates the label into an address. Sample label:
The opcode is the instruction that the processor will run, and is indented like the directives. In this sample, JMP is the opcode that tells the processor to jump to the MyFunction label:
The operands are additional information for the opcode. Opcodes have between one and three operands. In this example the #$FF is the operand:
Comments are to help you understand in English what the code is doing. When you write code and come back later, the comments will save you. You do not need a comment on every line, but should have enough to explain what is happening. Comments start with a ; and are completely ignored by the assembler. They can be put anywhere horizontally, but are usually spaced beyond the long lines.
6502 Processor Overview
The 6502 is an 8 bit processor with a 16 bit address bus. It can access 64KB of memory without bank switching. In the NES this memory space is split up into RAM, PPU/Audio/Controller access, and game ROM.
6502 Assembly Overview
The assembly language for 6502 starts with a 3 character code for the instruction "opcode". There are 56 instructions, 10 of which you will use frequently. Many instructions will have a value after the opcode, which you can write in decimal or hex. If that value starts with a # then it means use the actual number. If the value doesn't have then # then it means use the value at that address. So LDA #$05 means load the value 5, LDA $0005 means load the value that is stored at address $0005.
6502 Registers
A register is a place inside the processor that holds a value. The 6502 has three 8 bit registers and a status register that you will be using. All your data processing uses these registers. There are additional registers that are not covered in this tutorial.
The Accumulator (A) is the main 8 bit register for loading, storing, comparing, and doing math on data. Some of the most frequent operations are:
Index Register X
The Index Register X (X) is another 8 bit register, usually used for counting or memory access. In loops you will use this register to keep track of how many times the loop has gone, while using A to process data. Some frequent operations are:
Index Register Y
The Index Register Y (Y) works almost the same as X. Some instructions (not covered here) only work with X and not Y. Some operations are:
Status Register
The Status Register holds flags with information about the last instruction. For example when doing a subtract you can check if the result was a zero.
6502 Instruction Set
These are just the most common and basic instructions. Most have a few different options which will be used later. There are also a few more complicated instructions to be covered later.
Common Load/Store opcodes
Common Math opcodes
Common Comparison opcodes
Common Control Flow opcodes
NES Code Structure
Getting Started
This section has a lot of information because it will get everything set up to run your first NES program. Much of the code can be copy/pasted then ignored for now. The main goal is to just get NESASM to output something useful.
iNES Header
The 16 byte iNES header gives the emulator all the information about the game including mapper, graphics mirroring, and PRG/CHR sizes. You can include all this inside your asm file at the very beginning.
NESASM arranges everything in 8KB code and 8KB graphics banks. To fill the 16KB PRG space 2 banks are needed. Like most things in computing, the numbering starts at 0. For each bank you have to tell the assembler where in memory it will start.
Adding Binary Files Additional data files are frequently used for graphics data or level data. The incbin directive can be used to include that data in your .NES file. This data will not be used yet, but is needed to make the .NES file size match the iNES header.
There are three times when the NES processor will interrupt your code and jump to a new location. These vectors, held in PRG ROM tell the processor where to go when that happens. Only the first two will be used in this tutorial.
NMI Vector - this happens once per video frame, when enabled. The PPU tells the processor it is starting the VBlank time and is available for graphics updates.
RESET Vector - this happens every time the NES starts up, or the reset button is pressed.
IRQ Vector - this is triggered from some mapper chips or audio interrupts and will not be covered.
These three must always appear in your assembly file the right order. The .dw directive is used to define a Data Word (1 word = 2 bytes):
Reset Code
The reset vector was set to the label RESET, so when the processor starts up it will start from RESET: Using the .org directive that code is set to a space in game ROM. A couple modes are set right at the beginning. We are not using IRQs, so they are turned off. The NES 6502 processor does not have a decimal mode, so that is also turned off. This section does NOT include everything needed to run code on the real NES, but will work with the FCEUXD SP emulator. More reset code will be added later.
Completing The Program
Your first program will be very exciting, displaying an entire screen of one color! To do this the first PPU settings need to be written. This is done to memory address $2001. The 76543210 is the bit number, from 7 to 0. Those 8 bits form the byte you will write to $2001.
So if you want to enable the sprites, you set bit 3 to 1. For this program bits 7, 6, 5 will be used to set the screen color:
Putting It All Together
Download and unzip the sample files. All the code above is in the background.asm file. Make sure that file, mario.chr, and background.bat is in the same folder as NESASM3, then double click on background.bat. That will run NESASM3 and should produce background.nes. Run that NES file in FCEUXD SP to see your background color! Edit background.asm to change the intensity bits 7-5 to make the background red or green.
You can start the Debug... from the Tools menu in FCEUXD SP to watch your code run. Hit the Step Into button, choose Reset from the NES menu, then keep hitting Step Into to run one instruction at a time. On the left is the memory address, next is the hex opcode that the 6502 is actually running. This will be between one and three bytes. After that is the code you wrote, with the comments taken out and labels translated to addresses. The top line is the instruction that is going to run next. So far there isn't much code, but the debugger will be very helpful later.
NEXT WEEK: more PPU details, start of graphics
This Week: starts getting into more details about the 6502 and intro to assembly language. The lessons for asm usage and NES specifics will be done in sections together. There are many other 6502 websites and good books which may help you learn better.
6502 Assembly
Bit - The smallest unit in computers. It is either a 1 (on) or a 0 (off), like a light switch.
Byte - 8 bits together form one byte, a number from 0 to 255. Two bytes put together is 16 bits, forming a number from 0 to 65535. Bits in the byte are numbered starting from the right at 0.
Instruction - one command a processor executes. Instructions are run sequentially.
Code Layout
In assembly language there are 5 main parts. Some parts must be in a specific horizontal position for the assembler to use them correctly.
Directives are commands you send to the assembler to do things like locating code in memory. They start with a . and are indented. Some people use tabs, or 4 spaces, and I use 2 spaces. This sample directive tells the assembler to put the code starting at memory location $8000, which is inside the game ROM area:
.org $8000
The label is aligned to the far left and has a : at the end. The label is just something you use to organize your code and make it easier to read. The assembler translates the label into an address. Sample label:
.org $8000When the assembler runs, it will do a find/replace to set MyFunction to $8000. The if you have any code that uses MyFunction like:
STA MyFunctionIt will find/replace to:
STA $8000
The opcode is the instruction that the processor will run, and is indented like the directives. In this sample, JMP is the opcode that tells the processor to jump to the MyFunction label:
.org $8000
JMP MyFunction
The operands are additional information for the opcode. Opcodes have between one and three operands. In this example the #$FF is the operand:
.org $8000
JMP MyFunction
Comments are to help you understand in English what the code is doing. When you write code and come back later, the comments will save you. You do not need a comment on every line, but should have enough to explain what is happening. Comments start with a ; and are completely ignored by the assembler. They can be put anywhere horizontally, but are usually spaced beyond the long lines.
.org $8000This code would just continually run the loop, loading the hex value $FF into the accumulator each time.
MyFunction: ; loads FF into accumulator
JMP MyFunction
6502 Processor Overview
The 6502 is an 8 bit processor with a 16 bit address bus. It can access 64KB of memory without bank switching. In the NES this memory space is split up into RAM, PPU/Audio/Controller access, and game ROM.
$0000-0800 - Internal RAM, 2KB chip in the NESAny of the game cart sections can be bank switched to get access to more memory, but memory mappers will not be included in this tutorial.
$2000-2007 - PPU access ports
$4000-4017 - Audio and controller access ports
$6000-7FFF - Optional WRAM inside the game cart
$8000-FFFF - Game cart ROM
6502 Assembly Overview
The assembly language for 6502 starts with a 3 character code for the instruction "opcode". There are 56 instructions, 10 of which you will use frequently. Many instructions will have a value after the opcode, which you can write in decimal or hex. If that value starts with a # then it means use the actual number. If the value doesn't have then # then it means use the value at that address. So LDA #$05 means load the value 5, LDA $0005 means load the value that is stored at address $0005.
6502 Registers
A register is a place inside the processor that holds a value. The 6502 has three 8 bit registers and a status register that you will be using. All your data processing uses these registers. There are additional registers that are not covered in this tutorial.
The Accumulator (A) is the main 8 bit register for loading, storing, comparing, and doing math on data. Some of the most frequent operations are:
LDA #$FF ;load the hex value $FF (decimal 256) into A
STA $0000 ;store the accumulator into memory location $0000, internal RAM
Index Register X
The Index Register X (X) is another 8 bit register, usually used for counting or memory access. In loops you will use this register to keep track of how many times the loop has gone, while using A to process data. Some frequent operations are:
LDX $0000 ;load the value at memory location $0000 into X
INX ;increment X X = X + 1
Index Register Y
The Index Register Y (Y) works almost the same as X. Some instructions (not covered here) only work with X and not Y. Some operations are:
STY $00BA ;store Y into memory location $00BA
TYA ;transfer Y into Accumulator
Status Register
The Status Register holds flags with information about the last instruction. For example when doing a subtract you can check if the result was a zero.
6502 Instruction Set
These are just the most common and basic instructions. Most have a few different options which will be used later. There are also a few more complicated instructions to be covered later.
Common Load/Store opcodes
LDA #$0A ; LoaD the value 0A into the accumulator A
; the number part of the opcode can be a value or an address
; if the value is zero, the zero flag will be set.
LDX $0000 ; LoaD the value at address $0000 into the index register X
; if the value is zero, the zero flag will be set.
LDY #$FF ; LoaD the value $FF into the index register Y
; if the value is zero, the zero flag will be set.
STA $2000 ; STore the value from accumulator A into the address $2000
; the number part must be an address
STX $4016 ; STore value in X into $4016
; the number part must be an address
STY $0101 ; STore Y into $0101
; the number part must be an address
TAX ; Transfer the value from A into X
; if the value is zero, the zero flag will be set
TAY ; Transfer A into Y
; if the value is zero, the zero flag will be set
TXA ; Transfer X into A
; if the value is zero, the zero flag will be set
TYA ; Transfer Y into A
; if the value is zero, the zero flag will be set
Common Math opcodes
ADC #$01 ; ADd with Carry
; A = A + $01 + carry
; if the result is zero, the zero flag will be set
SBC #$80 ; SuBtract with Carry
; A = A - $80 - (1 - carry)
; if the result is zero, the zero flag will be set
CLC ; CLear Carry flag in status register
; usually this should be done before ADC
SEC ; SEt Carry flag in status register
; usually this should be done before SBC
INC $0100 ; INCrement value at address $0100
; if the result is zero, the zero flag will be set
DEC $0001 ; DECrement $0001
; if the result is zero, the zero flag will be set
INY ; INcrement Y register
; if the result is zero, the zero flag will be set
INX ; INcrement X register
; if the result is zero, the zero flag will be set
DEY ; DEcrement Y
; if the result is zero, the zero flag will be set
DEX ; DEcrement X
; if the result is zero, the zero flag will be set
ASL A ; Arithmetic Shift Left
; shift all bits one position to the left
; this is a multiply by 2
; if the result is zero, the zero flag will be set
LSR $6000 ; Logical Shift Right
; shift all bits one position to the right
; this is a divide by 2
; if the result is zero, the zero flag will be set
Common Comparison opcodes
CMP #$01 ; CoMPare A to the value $01
; this actually does a subtract, but does not keep the result
; instead you check the status register to check for equal,
; less than, or greater than
CPX $0050 ; ComPare X to the value at address $0050
CPY #$FF ; ComPare Y to the value $FF
Common Control Flow opcodes
JMP $8000 ; JuMP to $8000, continue running code there
BEQ $FF00 ; Branch if EQual, contnue running code there
; first you would do a CMP, which clears or sets the zero flag
; then the BEQ will check the zero flag
; if zero is set (values were equal) the code jumps to $FF00 and runs there
; if zero is clear (values not equal) there is no jump, runs next instruction
BNE $FF00 ; Branch if Not Equal - opposite above, jump is made when zero flag is clear
NES Code Structure
Getting Started
This section has a lot of information because it will get everything set up to run your first NES program. Much of the code can be copy/pasted then ignored for now. The main goal is to just get NESASM to output something useful.
iNES Header
The 16 byte iNES header gives the emulator all the information about the game including mapper, graphics mirroring, and PRG/CHR sizes. You can include all this inside your asm file at the very beginning.
.inesprg 1 ; 1x 16KB bank of PRG code
.ineschr 1 ; 1x 8KB bank of CHR data
.inesmap 0 ; mapper 0 = NROM, no bank swapping
.inesmir 1 ; background mirroring (ignore for now)
NESASM arranges everything in 8KB code and 8KB graphics banks. To fill the 16KB PRG space 2 banks are needed. Like most things in computing, the numbering starts at 0. For each bank you have to tell the assembler where in memory it will start.
.bank 0
.org $C000
;some code here
.bank 1
.org $E000
; more code here
.bank 2
.org $0000
; graphics here
Adding Binary Files Additional data files are frequently used for graphics data or level data. The incbin directive can be used to include that data in your .NES file. This data will not be used yet, but is needed to make the .NES file size match the iNES header.
.bank 2
.org $0000
.incbin "mario.chr" ;includes 8KB graphics file from SMB1
There are three times when the NES processor will interrupt your code and jump to a new location. These vectors, held in PRG ROM tell the processor where to go when that happens. Only the first two will be used in this tutorial.
NMI Vector - this happens once per video frame, when enabled. The PPU tells the processor it is starting the VBlank time and is available for graphics updates.
RESET Vector - this happens every time the NES starts up, or the reset button is pressed.
IRQ Vector - this is triggered from some mapper chips or audio interrupts and will not be covered.
These three must always appear in your assembly file the right order. The .dw directive is used to define a Data Word (1 word = 2 bytes):
.bank 1
.org $FFFA ;first of the three vectors starts here
.dw NMI ;when an NMI happens (once per frame if enabled) the
;processor will jump to the label NMI:
.dw RESET ;when the processor first turns on or is reset, it will jump
;to the label RESET:
.dw 0 ;external interrupt IRQ is not used in this tutorial
Reset Code
The reset vector was set to the label RESET, so when the processor starts up it will start from RESET: Using the .org directive that code is set to a space in game ROM. A couple modes are set right at the beginning. We are not using IRQs, so they are turned off. The NES 6502 processor does not have a decimal mode, so that is also turned off. This section does NOT include everything needed to run code on the real NES, but will work with the FCEUXD SP emulator. More reset code will be added later.
.bank 0
.org $C000
SEI ; disable IRQs
CLD ; disable decimal mode
Completing The Program
Your first program will be very exciting, displaying an entire screen of one color! To do this the first PPU settings need to be written. This is done to memory address $2001. The 76543210 is the bit number, from 7 to 0. Those 8 bits form the byte you will write to $2001.
PPUMASK ($2001)
|||||||+- Grayscale (0: normal color; 1: AND all palette entries
||||||| with 0x30, effectively producing a monochrome display;
||||||| note that colour emphasis STILL works when this is on!)
||||||+-- Disable background clipping in leftmost 8 pixels of screen
|||||+--- Disable sprite clipping in leftmost 8 pixels of screen
||||+---- Enable background rendering
|||+----- Enable sprite rendering
||+------ Intensify reds (and darken other colors)
|+------- Intensify greens (and darken other colors)
+-------- Intensify blues (and darken other colors)
So if you want to enable the sprites, you set bit 3 to 1. For this program bits 7, 6, 5 will be used to set the screen color:
LDA %10000000 ;intensify blues
STA $2001
JMP Forever ;infinite loop
Putting It All Together
Download and unzip the sample files. All the code above is in the background.asm file. Make sure that file, mario.chr, and background.bat is in the same folder as NESASM3, then double click on background.bat. That will run NESASM3 and should produce background.nes. Run that NES file in FCEUXD SP to see your background color! Edit background.asm to change the intensity bits 7-5 to make the background red or green.
You can start the Debug... from the Tools menu in FCEUXD SP to watch your code run. Hit the Step Into button, choose Reset from the NES menu, then keep hitting Step Into to run one instruction at a time. On the left is the memory address, next is the hex opcode that the 6502 is actually running. This will be between one and three bytes. After that is the code you wrote, with the comments taken out and labels translated to addresses. The top line is the instruction that is going to run next. So far there isn't much code, but the debugger will be very helpful later.
NEXT WEEK: more PPU details, start of graphics
Nerdy Nights week 4: Color Palettes, Sprites, second app
Previous Week - 6502 ASM
This Week: now that you can make and run a program, time to put something on screen!
Before putting any graphics on screen, you first need to set the color palette. There are two separate palettes, each 16 bytes. One palette is used for the background, and the other for sprites. The byte in the palette corresponds to one of the 64 base colors the NES can display. $0D is a bad color and should not be used. These colors are not exact and will look different on emulators and TVs.
The palettes start at PPU address $3F00 and $3F10. To set this address, PPU address port $2006 is used. This port must be written twice, once for the high byte then for the low byte:
LDA $2002 ; read PPU status to reset the high/low latch to high
LDA #$3F
STA $2006 ; write the high byte of $3F10 address
LDA #$10
STA $2006 ; write the low byte of $3F10 address
That code tells the PPU to set its address to $3F10. Now the PPU data port at $2007 is ready to accept data. The first write will go to the address you set ($3F10), then the PPU will automatically increment the address ($3F11, $3F12, $3F13) after each read or write. You can keep writing data and it will keep incrementing. This sets the first 4 colors in the palette:
Once that code finishes, the full color palette is ready. One byte or the whole thing can be changed while your program is running.
LDA #$32 ;code for light blueishYou would continue to do writes to fill out the rest of the palette. Fortunately there is a smaller way to write all that code. First you can use the .db directive to store data bytes:
STA $2007 ;write to PPU $3F10
LDA #$14 ;code for pinkish
STA $2007 ;write to PPU $3F11
LDA #$2A ;code for greenish
STA $2007 ;write to PPU $3F12
LDA #$16 ;code for redish
STA $2007 ;write to PPU $3F13
PaletteData:Then a loop is used to copy those bytes to the palette in the PPU. The X register is used as an index into the palette, and used to count how many times the loop has repeated. You want to copy both palettes at once which is 32 bytes, so the loop starts at 0 and counts up to 32.
.db $0F,$31,$32,$33,$0F,$35,$36,$37,$0F,$39,$3A,$3B,$0F,$3D,$3E,$0F ;background palette data
.db $0F,$1C,$15,$14,$0F,$02,$38,$3C,$0F,$1C,$15,$14,$0F,$02,$38,$3C ;sprite palette data
LDX #$00 ; start out at 0
LDA PaletteData, x ; load data from address (PaletteData + the value in x)
; 1st time through loop it will load PaletteData+0
; 2nd time through loop it will load PaletteData+1
; 3rd time through loop it will load PaletteData+2
; etc
STA $2007 ; write to PPU
INX ; X = X + 1
CPX #$20 ; Compare X to hex $20, decimal 32
BNE LoadPalettesLoop ; Branch to LoadPalettesLoop if compare was Not Equal to zero
; if compare was equal to 32, keep going down
Once that code finishes, the full color palette is ready. One byte or the whole thing can be changed while your program is running.
Anything that moves separately from the background will be made of sprites. A sprite is just an 8x8 pixel tile that the PPU renders anywhere on the screen. Generally objects are made from multiple sprites next to each other. Examples would be Mario and any of the enemies like Goombas and Bowser. The PPU has enough internal memory for 64 sprites. This memory is separate from all other video memory and cannot be expanded.
Sprite DMA
The fastest and easiest way to transfer your sprites to the sprite memory is using DMA (direct memory access). This just means a block of RAM is copied from CPU memory to the PPU sprite memory. The on board RAM space from $0200-02FF is usually used for this purpose. To start the transfer, two bytes need to be written to the PPU ports:
LDA #$00Once the second write is done the DMA transfer will start automatically. All data for the 64 sprites will be copied. Like all graphics updates, this needs to be done at the beginning of the VBlank period, so it will go in the NMI section of your code.
STA $2003 ; set the low byte (00) of the RAM address
LDA #$02
STA $4014 ; set the high byte (02) of the RAM address, start the transfer
Sprite Data
Each sprite needs 4 bytes of data for its position and tile information in this order:
1 - Y Position - vertical position of the sprite on screen. $00 is the top of the screen. Anything above $EF is off the bottom of the screen.
2 - Tile Number - this is the tile number (0 to 256) for the graphic to be taken from a Pattern Table.
3 - Attributes - this byte holds color and displaying information:
765432104 - X Position - horizontal position on the screen. $00 is the left side, anything above $F9 is off screen.
||| ||
||| ++- Color Palette of sprite. Choose which set of 4 from the 16 colors to use
||+------ Priority (0: in front of background; 1: behind background)
|+------- Flip sprite horizontally
+-------- Flip sprite vertically
Those 4 bytes repeat 64 times (one set per sprite) to fill the 256 bytes of sprite memory. If you want to edit sprite 0, you change bytes $0200-0203. Sprite 1 is $0204-0207, sprite 2 is $0208-020B, etc
Turning NMI/Sprites On
The PPU port $2001 is used again to enable sprites. Setting bit 4 to 1 will make them appear.
NMI also needs to be turned on, so the Sprite DMA will run and the sprites will be copied every frame. This is done with the PPU port $2000. The Pattern Table 0 is also selected to choose sprites from. Background will come from Pattern Table 1 when that is added later.
PPUCTRL ($2000)And the new code to set up the sprite data:
| ||||||
| ||||++- Base nametable address
| |||| (0 = $2000; 1 = $2400; 2 = $2800; 3 = $2C00)
| |||+--- VRAM address increment per CPU read/write of PPUDATA
| ||| (0: increment by 1, going across; 1: increment by 32, going down)
| ||+---- Sprite pattern table address for 8x8 sprites (0: $0000; 1: $1000)
| |+----- Background pattern table address (0: $0000; 1: $1000)
| +------ Sprite size (0: 8x8; 1: 8x16)
+-------- Generate an NMI at the start of the
vertical blanking interval vblank (0: off; 1: on)
LDA #$80
STA $0200 ;put sprite 0 in center ($80) of screen vertically
STA $0203 ;put sprite 0 in center ($80) of screen horizontally
LDA #$00
STA $0201 ;tile number = 0
STA $0202 ;color palette = 0, no flipping
LDA #%10000000 ; enable NMI, sprites from Pattern Table 0
STA $2000
LDA #%00010000 ; no intensify (black background), enable sprites
STA $2001
Putting It All Together
Download and unzip the sample files. All the code above is in the sprites.asm file. Make sure sprites.asm, mario.chr, and sprites.bat are all in the same folder as NESASM3, then double click sprites.bat. That will run NESASM3 and should produce the sprites.nes file. Run that NES file in FCEUXD SP to see your sprite! Tile number 0 is the back of Mario's head and hat, can you see it? Edit sprites.asm to change the sprite position (0 to 255), or to change the color palette for the sprite (0 to 3).
You can choose the PPU viewer in FCEUXD SP to see both Pattern Tables, and both Palettes.
Next Week: multiple sprites, reading controllers
I am curious, however... Is there a different value to use if, say, you had both PRG and CHR ROM like with MMC1 or MMC3?
Mappers like MMC1 and MMC3 use more than a single write to do bank switching. They also have more logic so the ROM doesn't respond to a write command, meaning no bus conflicts.
Bank Switching Code
The final part it to write your bank switching code. This subroutine will take a bank number in the x register and switch the CHR bank to it immediately. The actual switch is done by writing the desired bank number anywhere in the $8000-FFFF memory range. The cart hardware sees this write and changes the CHR bank.
... your game code ...
LDA #$01 ;;put new bank to use into X register
JSR Bankswitch ;;jump to bank switching code
... your game code ...
STA $8000 ;;new bank to use
Nice to see Nerdy Nights back.
I found a little typo in the above part. You talk about using the X register (both in the paragraph and in the code comments) but you do everything with A. I don't think anyone who has made it far enough to complete pong will get confused by that, but for completeness' sake, you might want to fix it
Okay, I didn't see it.
Bus Conflicts
When you start running your code on real hardware there is one catch to worry about. For basic mappers, the PRG ROM does not care if it receives a read or a write command. It will respond to both like a read by putting the data on the data bus. This is a problem for bank switching, where the CPU is also trying to put data on the data bus at the same time. They electrically fit in a "bus conflict". The CPU could win, giving you the right value. Or the ROM could win, giving you the wrong value. This is solved by having the ROM and CPU put the same value on the data bus, so there is no conflict. First a table of bank numbers is made, and the value from that table is written to do the bank switch.
... code ...
LDA #$01 ;;put new bank to use into A
JSR Bankswitch ;;jump to bank switching code
... code ...
TAX ;;copy A into X
STA Bankvalues, X ;;new bank to use
.db $00, $01, $02, $03 ;;bank numbers
The X register is used as an index into the Bankvalues table, so the value written by the CPU will match the value coming from the ROM.
you could do something like
LDA #$03
STA Bankvalue + #$03
if you knew you wanted to switch to bank 3.. but it's better to reuse the general bank switching function.. that way if you end up switching mappers you only ahve to change one area of code
.db causes the bytes to be written at that point in the code. data doesn't automatically start at $8000 or anything.. even your program code won't start at $8000 unless you tell the assembler to do so with an org command. (well the pointers won't start at $8000)
you could do something like
LDA #$03
STA Bankvalue + #$03
if you knew you wanted to switch to bank 3.. but it's better to reuse the general bank switching function.. that way if you end up switching mappers you only ahve to change one area of code
Ahh okay, so when you LDA #$03 you're putting 3 into the A register (CPU), and then storing it into the BankValue (ROM) which signals the bank switch? This way the CPU & ROM match?
Ahh okay, so when you LDA #$03 you're putting 3 into the A register (CPU), and then storing it into the BankValue (ROM) which signals the bank switch? This way the CPU & ROM match?
That's the key, the values have to match or you will get the wrong result in the bankswitch. ROM is read only so you can't change the data there.
Dead link." target="_blank">
Can you please update to a new site (maybe something better has come along?) or link to an archive on*/
The hardest part of learning the code in NES 6502 is understanding why you must do something. Most of the time I can follow the code and understand what it does, I'm just not far enough along to see how it will be useful.
Thanks in advance. I know I keep on bringing up dead threads. My bad.
You can do a lot of things with this. For example:
1. Bank 0 can be your start screen graphics where you have a giant ass picture. Bank 1 could contain main game stuff, 2 the cut screens, and 3 the end screen. Just where you have too many graphics to fit into one CHR bank.
2. You can animate the entire background and sprite set at once. A la
3. Or you can do somthing like this:
I've personally used it for his first example: I generally try to make my title screens a little fancier, so I use an entire pattern table for it. Then I have another pattern table for the gameplay. That way I don't have to try to piece something together from the random tiles I have for the main part of the game. Although I did this for Study Hall and it turned out pretty cool.
So a game like centipede would use bank switching to animate the centipede and the objects that you shoot? I assume that each bank is still limited to the same 16 color pallete.
I confused myself. I now see that the animation is the shifting of the enemy back and forth plus they change their form.
I could see how it would be useful in the first example.
Is bank switching how they built Mega Man? I know there was a trick they used in order to get his face a different color than the rest of his body to trick the 4 color limit.
And BAN used 4 frame animation. Though you can't tell unless you get real close.
Thanks for clarifying the Mega Man meta-sprite.
Sorry, guys, I just don't get this. At the end of the bankswitch subroutine the value in X gets stored in the bankvalues area, then what? What part of the code actually causes the bank switching to happen? Sorry if this is a dumb question.
It's storing it into the $8000 address, right?
STA $8000 ;;new bank to use
Sorry, guys, I just don't get this. At the end of the bankswitch subroutine the value in X gets stored in the bankvalues area, then what? What part of the code actually causes the bank switching to happen? Sorry if this is a dumb question.
It's storing it into the $8000 address, right?
STA $8000 ;;new bank to use
What section of the code does it write to $8000? I can't seem to find it.
Sorry, guys, I just don't get this. At the end of the bankswitch subroutine the value in X gets stored in the bankvalues area, then what? What part of the code actually causes the bank switching to happen? Sorry if this is a dumb question.
It's storing it into the $8000 address, right?
STA $8000 ;;new bank to use
What section of the code does it write to $8000? I can't seem to find it.
"The actual switch is done by writing the desired bank number anywhere in the $8000-FFFF memory range."
Bus Conflicts
"When you start running your code on real hardware there is one catch to worry about. For basic mappers, the PRG ROM does not care if it receives a read or a write command. It will respond to both like a read by putting the data on the data bus. This is a problem for bank switching, where the CPU is also trying to put data on the data bus at the same time. They electrically fit in a "bus conflict". The CPU could win, giving you the right value. Or the ROM could win, giving you the wrong value. This is solved by having the ROM and CPU put the same value on the data bus, so there is no conflict. First a table of bank numbers is made, and the value from that table is written to do the bank switch."
.bank 0 ;;Code is stored starting at $C000
.org $C000
... code ...
LDA #$01 ;;Load A with $01 (For bank 1)
JSR Bankswitch ;;jump to bank switching code
... code ...
TAX ;;copy $01 into X
STA Bankvalues, X ;;Write $01 to Bankvalues,$01 (Which is stored somewhere between $C000-$DFFF). This is done to avoid the Bus Conflict in the paragraph above
.db $00, $01, $02, $03 ;;bank numbers
Since you write to Bankvalues, that table is stored in $8000-FFFF memory range. You can write to ANY number in that range, so you write the value of X into Bankvalues.
Hope that helps clarify. This is the way I understand it.