UNIVERSITY OF OSLO Department of Informatics

## **Active Echo**

High Precision Ranging in Wireless Sensor Networks

Master thesis

Nikolaj Andersen

May 2, 2007



# Preface

During the past five years I have attended the Master's Degree Program at the Department of Informatics, University of Oslo. This thesis concludes my work with the degree Master of Informatics in Microelectronics. The master thesis project was initiated in January 2006 and finished in May the following year. During this project I have worked closely with Håkon K. Olafsen, and together we have successfully designed and performed measurements on a chip. The results are included in both Håkon's and mine thesis. The main focus of the project has however been on developing a ranging scheme for wireless sensor networks, and the individual part of this thesis contains my results from this work.

I would like to thank my supervisor Tor Sverre Lande for his guidance and encouragement, and for being such a great source of inspiration. Håkon Hjortland also deserve a special thanks, since his input has been extremely valuable during the work with this thesis. I would also like to express my gratitude to the other part of team Håkon, namely Håkon K. Olafsen, for countless and sometimes endless discussions on more or less relevant topics, and for valuable input.

Next, I would like to give my thanks to Hans and Håvard for providing assistance when needed, and to the rest of the MES group, including but not limited to Dag, Philip, Mats, Snorre, Lena, Johannes and Yngvar, for providing an inspiring environment.

To the boys and girls at the lab, including Håvard, Trygve, Jan Erik, Eirik, Ole-Petter, Dag, Øyvind, Jostein, Svein, Jens Petter and Jenny, and to my colleagues at Novelda, Kristian, Kjetil and Claus: Thanks for the coffee and lunch breaks, the valuable discussions and inputs, and for the superb social environment.

It should also be mentioned that this thesis would literally not have been the same without the excellent LATEX support provided by Team Håkon.

To the veilabben crew and the rest of my comrades here at the University, and to my friends and family: Thanks for keeping me sane.

PREFACE

iv

# Abstract

Wireless Sensor Network is an increasingly popular technology, and has received considerable attention during the recent years. Positioning is a common feature in this technology, and a vital part of almost all positioning algorithms is range estimation. Because of the small device sizes, low power consumption and low cost of wireless sensor nodes, many of the currently available ranging technologies are unsuitable in a Wireless Sensor Network. This calls for new and perhaps unorthodox approaches.

In this thesis a novel ranging scheme utilizing the high bandwidths of Ultra-Wide Band Impulse Radio signals to achieve high precision and accuracy is proposed. Through the use of Continuous Time Quantized Amplitude signal processing, the proposed scheme achieve dedicated hardware solutions based on simple digital circuits. The scheme is compatible with the constrains imposed by a Wireless Sensor Network (WSN) node, through the use of a low power receiver topology, technology scalable devices and compact size.

As part of the thesis two circuits, a ring oscillator and a Time Difference Measuring Circuit, have been realized in 90nm CMOS. The measurement results provided are of great interest, both generally and to a future implementation of the proposed scheme.

ABSTRACT

vi

# Contents

| Pr       | Preface<br>Abstract |              |                            |    |  |
|----------|---------------------|--------------|----------------------------|----|--|
| A        |                     |              |                            |    |  |
| Та       | Table of Contents   |              |                            |    |  |
| 1        | Intr                | Introduction |                            |    |  |
|          | 1.1                 | A brief      | historical overview        | 2  |  |
|          | 1.2                 | Goal of      | f this thesis              | 3  |  |
|          | 1.3                 | Outline      | e of thesis                | 3  |  |
| 2        | Background          |              |                            | 5  |  |
|          | 2.1                 | Chapte       | er overview                | 5  |  |
|          | 2.2                 | Ultra V      | Vide Band                  | 5  |  |
|          |                     | 2.2.1        | Impulse Radio              | 6  |  |
|          |                     | 2.2.2        | Common terminology in UWB  | 6  |  |
| 2.3 Sign |                     | Signal       | processing of UWB signals  | 7  |  |
|          |                     | 2.3.1        | UWB receiver topologies    | 8  |  |
|          | 2.4                 | Behavi       | or of a quantized signal   | 10 |  |
|          |                     | 2.4.1        | The input of the quantizer | 10 |  |
|          |                     | 2.4.2        | Thresholding               | 14 |  |
|          | 2.5                 | Sample       | ers                        | 17 |  |
|          |                     | 2.5.1        | Strobed sampler            | 18 |  |
|          |                     | 2.5.2        | Integrating sampler        | 20 |  |

| CONTENTS | CO | NT | ΈN | JTS |
|----------|----|----|----|-----|
|----------|----|----|----|-----|

| 3 | Ranging                  |                                   |    |
|---|--------------------------|-----------------------------------|----|
|   | 3.1                      | Chapter overview                  | 23 |
|   | 3.2                      | Performance quantities            | 23 |
|   | 3.3                      | Ranging techniques for WSN        | 24 |
|   |                          | 3.3.1 Received Signal Strength    | 24 |
|   |                          | 3.3.2 Time of Flight              | 25 |
|   | 3.4                      | Proposed ranging scheme           | 26 |
|   |                          | 3.4.1 System Concept              | 26 |
|   | 3.5                      | Device design                     | 28 |
|   |                          | 3.5.1 Echo device                 | 29 |
|   |                          | 3.5.2 Main device                 | 30 |
|   | 3.6                      | Benefits from Active Echo         | 30 |
|   | 3.7                      | Summary                           | 32 |
| 4 | Active Echo architecture |                                   |    |
|   | 4.1                      | Chapter overview                  | 35 |
|   | 4.2                      | Simulation of an AWGN channel     | 36 |
|   | 4.3                      | Echo device performance           | 37 |
|   |                          | 4.3.1 Delta error                 | 40 |
|   | 4.4                      | Sampler strategies in Main Device | 41 |
|   |                          | 4.4.1 Delay line sampler          | 41 |
|   |                          | 4.4.2 Integrating sampler         | 42 |
|   |                          | 4.4.3 Swept threshold             | 46 |
|   |                          | 4.4.4 Adaptive thresholding       | 47 |
|   | 4.5                      | Estimating the ToF                | 47 |
|   |                          | 4.5.1 Thresholding                | 48 |
|   |                          | 4.5.2 Max selection               | 48 |
|   |                          | 4.5.3 Simulated performance       | 49 |
|   | 4.6                      | Regulatory considerations         | 49 |
|   | 4.7                      | Sources of Error                  | 50 |
|   |                          | 4.7.1 Signal velocity             | 51 |

## viii

|   |      | 4.7.2            | Delta error revisited                 | 51  |
|---|------|------------------|---------------------------------------|-----|
|   | 4.8  | Perfor           | mance in non-ideal channel conditions | 52  |
|   |      | 4.8.1            | Multipath channels                    | 52  |
|   |      | 4.8.2            | Multi-User interference               | 53  |
|   | 4.9  | Summ             | nary                                  | 54  |
| 5 | Circ | uit imp          | plementations                         | 57  |
|   | 5.1  | Chapter overview |                                       |     |
|   | 5.2  | Ring oscillator  |                                       |     |
|   |      | 5.2.1            | Introduction                          | 58  |
|   |      | 5.2.2            | Motivation                            | 58  |
|   |      | 5.2.3            | Overview of the circuit               | 60  |
|   |      | 5.2.4            | Schematic                             | 63  |
|   |      | 5.2.5            | Layout                                | 65  |
|   | 5.3  | Ring o           | oscillator measurements               | 65  |
|   |      | 5.3.1            | Measurement setup                     | 65  |
|   |      | 5.3.2            | Introduction                          | 66  |
|   |      | 5.3.3            | Gate delay measurements               | 66  |
|   |      | 5.3.4            | The oscillator as a VCO               | 72  |
|   |      | 5.3.5            | Summary                               | 74  |
|   | 5.4  | Time             | Difference Measuring Circuit          | 76  |
|   |      | 5.4.1            | Motivation                            | 77  |
|   |      | 5.4.2            | Overview of the system                | 78  |
|   |      | 5.4.3            | Implementation                        | 79  |
|   |      | 5.4.4            | Expected results                      | 83  |
|   |      | 5.4.5            | Layout considerations                 | 92  |
|   | 5.5  | TDM              | C measurements                        | 92  |
|   |      | 5.5.1            | Measurement setup                     | 92  |
|   |      | 5.5.2            | Measurements                          | 92  |
|   |      | 5.5.3            | Summary                               | 102 |

ix

#### CONTENTS

| 6   | Con             | cluding remarks                   | 105 |  |
|-----|-----------------|-----------------------------------|-----|--|
|     | 6.1             | Future work                       | 106 |  |
| Α   | Layo            | out                               | 109 |  |
|     | A.1             | Ring Oscillator                   | 109 |  |
|     | A.2             | Time Difference Measuring Circuit | 109 |  |
| B   | PCB             | and Measurement setup             | 117 |  |
|     | B.1             | Measurement setup                 | 117 |  |
| Lis | List of Figures |                                   |     |  |
| Ac  | Acronyms        |                                   |     |  |
| Bil | Bibliography    |                                   |     |  |

х

## Chapter 1

# Introduction

Wireless communication has become a vital part of the everyday life of billions of people around the globe. Mobile phones have become a common means of communication, and more and more applications move to wireless technology platforms such as WLAN, Bluetooth and ZigBee.

A technology that has received considerable attention the recent years is Wireless Sensor Network (WSN). A WSN is a wireless network of sensor devices, spatially distributed within an area of interest. They are densely deployed, and cooperatively monitor various physical or environmental conditions such as sound, heat or motion. Although it originally was motivated by military applications, the application areas now span from environmental applications such as animal tracking and earth quake detectors to medical and health care applications. As device size continues to shrink, in-body medical sensors for instance become a possibility

Sensor networks often consist of nodes in the number of hundreds, or even thousands, depending on the specific application. This calls for small, cheap and disposable devices. In addition, the sensors are often expected to last for years using only a single battery as a power source, and without any possibility for replacement of recharging. Consequently, the power consumption has to be very low.

A common factor in most wireless sensor networks is that localization is a key feature. For the sensor data to be meaningful, the user needs to know the location of the node. Consider for instance a hospital where all the patients walk freely around and wear sensor nodes that monitor important body functions. The hospital staff can monitor all the patients, and if an incident occurs they are immediately notified. It is obvious that knowing the location of the unfortunate patient is vital to achieve a fast and reliable response to such incidents. Global Positioning System (GPS) is a widely used positioning technology. It consists of several satellites orbiting the globe, and aided by accurately synchronized clocks they simultaneously transmit time stamped signals. A GPS receiver with three or more satellites within range can then estimate its position by looking at the arrival time of these signals. GPS devices usually have a resolution of 5-10 m, can only be used outdoors, and are expensive in terms of cost, power and size. This render them unsuitable for most WSN applications.

Locating the sensor nodes with high precision and accuracy is challenging, but by knowing the distance between the individual nodes, their locations can be estimated. The process of measuring this distance is referred to as ranging, and variance in the ranging lead to errors in the estimated position. High precision ranging is therefore a valuable feature in a WSN node.

Obtaining range information by measuring the propagation delay between two nodes in a local network is currently one of the most popular approaches to the WSN localization problem. By transmitting time stamped signals, the two nodes can estimate their separation distance. This does however require synchronized clocks, which can be hard to achieve. Another popular solution is to measure the strength of the received signal. This is however an imprecise and unreliable technique.

In this thesis we will address the ranging problem by developing a high precision ranging scheme within the limits of a WSN node.

### **1.1** A brief historical overview

The concept of measuring distances with electromagnetic waves is an old one. In 1904 the German inventor Christian Hülsmeyer patented a device he called the *telemobiloscope*. It could detect remote metallic objects like ships by emitting radio waves and receiving the returning echo pulses. Later the same year he filed another patent where he improved his device to not only detect objects, but also measure the distance to the object. This was done by measuring the angle between the transmitter and the mast it was fastened to.

Hülsmeyers telemobiloscope is what we later have been accustomed to call a *RADAR*. For reasons unknown, the first version of the RADAR was never further developed, but in the mid-thirties Germany, U.K., USA and Russia independently developed working radar systems. During World War II it became a crucial equipment in air defense, and both ships and land installations relied on it as an early warning system for incoming airplanes.

2

The radar has been continuously improved since WWII, both in terms of accuracy and range. Recently a new breed of radars has been introduced, namely the Ultra-Wide Band Impulse Radio (UWB-IR) radar. They utilize the broad spectrum of the Ultra-Wide Band (UWB) pulses to achieve high resolution. Because the UWB pulses behave much like noise to other signals occupying nearby frequencies, the emitted power must be limited. The UWB-IR radars are therefore limited to short range applications. An example of such a system can be found in [Hjor 06].

## **1.2** Goal of this thesis

Inspired by the high resolution obtained by the UWB-IR radar, the main goal of this thesis is to investigate how those features can be utilized to create a ranging system for WSN applications. The UWB-IR radar has a resolution of 6.6 mm, and achieving such resolution in a WSN ranging system is a great improvement over existing systems.

A novel ranging scheme named Active Echo will be presented in this thesis, exploring the properties of the UWB-IR radar. The target resolution is 6.6mm or better. Since the scheme is intended for small sensor nodes, several constrains are imposed on the scheme that has to be considered. They are:

- Power consumption The scheme has to have a conservative power consumption if it is to be viable in a small sensor node.
- Technology The scheme should be implementable in a standard and commercially available technology such as CMOS to keep unit cost at an acceptable level.
- Size The scheme is a dedicated hardware solution, intended to be implemented together with a radio front end and possibly a processing unit such as an Micro Controller Unit (MCU). This limits the available area on the silicon die.

## **1.3** Outline of thesis

During the next three chapters, the proposed solution to the ranging problem is described and analyzed. Chapter two contains some background material and should provide a sufficient theoretical foundation for the reader. In chapter three some existing solutions are presented before the Active Echo scheme is presented. Chapter four contains a more detailed analysis of the Active Echo architecture. Chapter five contains circuit descriptions and measurement result of the two implemented circuits. In chapter six the thesis is concluded.

A listed outline of the thesis follows

Chapter two contains background material

- **Chapter three** describes existing ranging techniques, and Active Echo is presented
- **Chapter four** provides a detailed description of the Active Echo architecture
- **Chapter five** contains circuit descriptions and measurements. Written in cooperation with Håkon K. Olafsen [Olaf 07].

Chapter six contains concluding remarks and proposals for future work

## Chapter 2

# Background

Ultra-Wide Band is a promising technology for short range ranging and data communication purposes. In this chapter we will explore some of the features of this technology and provide the necessary background material for the rest of the thesis.

### 2.1 Chapter overview

The first part of this chapter explore UWB in general, and the basics of UWB and UWB-IR will be briefly explained. Some of the key terms used throughout this thesis are defined. In the second part the concept of continuous time signal processing of UWB-IR signals is introduced, and the remainder of this chapter is devoted to exploring the advantages of this signal processing domain with a special focus on ranging applications.

## 2.2 Ultra Wide Band

UWB is an increasingly popular technology, as it offers attractive possibilities in wireless communication, networking, radar, imaging and positioning systems [yang 04]. In 2002 FCC released a huge bandwidth (3.1-10.6 GHz) for commercial use in the U.S., and since then, similar regulations have been proposed in other parts of the world. This has paved way for UWB, and both commercial and academic research institutions have shown great interest in this technology.

For sensor network applications, where low data rate and low power consumption are key features, UWB is a promising technology. At the time of writing the IEEE group for low rate Wireless Personal Area Network (WPAN), 802.15.4, is investigating an alternative physical layer using UWB. The final standard is expected to be published some time during the summer of 2007. One of the primary goals of this new physical layer is to provide high precision ranging capabilities, with an accuracy of 1 m or better.

#### 2.2.1 Impulse Radio

Traditionally, wireless data communication has taken place using modulation of a carrier wave. Wireless LANs for instance, use a carrier frequency of approximately 2.4 GHz, and European GSM phones use a frequency of 900 or 1800 MHz. The idea behind UWB-IR is to send the signal using ultra short impulses instead of using amplitude or frequency modulation on a carrier. The simplest form of an impulse is a zero length square pulse. It occupies the entire frequency spectrum, and to other devices it behave much like noise. In practice, a perfect impulse is not feasible, first of all because it is impossible to create, and secondly because it would disturb all other radio communication. The commercial 3.1-10.6 GHz spectrum is suitable for impulse radio, because a pulse can be shaped to fit the regulatory frequency masks. Depending on different factors such as environment(outdoor/indoor) and data rate, the pulse shape can be altered to fit the intended application. It is common to use some suitable derivate of a Gaussian pulse, because it fits the frequency masks pretty well [Bene 04].

#### 2.2.2 Common terminology in UWB

The medium which the signal travels through is referred to as the *channel*. In most environments the signal from the transmitter to the receiver travel over several different paths. The shortest path, usually a straight line, is defined as the *direct path*. In addition there exist a number of extra paths. For instance a signal bouncing off a wall or nearby object represent such a path. A channel consisting of more than one path is defined as a *multipath channel*, while a channel consisting of only the direct path is defined as an *ideal channel*. Often the direct path will also be the strongest, as is the case in an environment where there are no obstacles blocking the signal. This is defined as having Line of Sight (LOS) and is illustrated in figure 2.1a. In contrast to LOS there is Non Line Of Sight (NLOS), which is illustrated in figure 2.1b. In the case of NLOS, the direct path might not be the strongest. In a time critical application, such as range estimation or synchronization circuits, this is an important observation.

Estimating the properties and behavior of the channel is an important task, and a valuable tool when evaluating different system alternatives. Because of this, great efforts have been put into creating channel models.



Figure 2.1: Signal paths in a) a LOS environment and b) a NLOS environment

The newest and most thorough channel model available at the time this thesis was written is the models created by the IEEE 802.15.4a task group [Moli 04]. They provide an impulse response model of the channel, modeling the expected number of multipaths and their strength under different environmental conditions.

Ideal channels are usually estimated by what is known as the Additive White Gaussian Noise (AWGN) channel, consisting only of noise with uniform Power Spectral Density (PSD) over the frequencies of interest.

## 2.3 Signal processing of UWB signals

Traditionally, signal processing has been either purely digital, with both amplitude information and time information stored as quantized values, or purely analog, with continuous amplitude and time processing. A traditional RF receiver for instance, consist of a filter and a Low-Noise Amplifier (LNA), both operating in the continuous domain in terms amplitude and time. The output of these circuits is then converted into digital values in an Analog to Digital Converter (ADC), and further processed in a Digital Signal Processor (DSP). A popular approach in analog signal processing in integrated circuits is switched-capacitor circuits, which operates on analog amplitudes, but in discrete time. Due to their accurate frequency domain operation they are frequently used in filters [John 97].

The high speeds and low prices offered from modern CMOS technology makes it a good choice for digital circuits, but the low operating voltages lead to low Signal-to-Noise Ratio (SNR) and consequently worse operating conditions for analog circuits. In addition, high precision digital signal processing requires high clock rates, which leads to excessive power consumption.

In some applications, accurate time information is more interesting than accurate amplitude information. This is the case in UWB devices, where synchronization is a fundamental problem. Closely related to the synchronization problem is the ranging problem, which boils down to estimating the arrival time of a pulse in the case of time based ranging techniques. By quantizing the signal amplitude, but not the time, at an early stage in the signal chain, the signal processing can be done using simple digital gates such as inverters. It eliminates the need for high precision clock generation and distribution, while it keeps the timing information intact and enables high precision signal processing. A certain amount of information is lost in translation, in fact, the only amplitude information left is that the signal was above a certain level, which is set by a threshold in the quantizer circuitry. This must be considered as a trade off for the high time resolution and low power dissipation. The concept of Continuous Time Quantized Amplitude (CTQA) signal processing is taken from [Hjor 06]

#### 2.3.1 UWB receiver topologies

Traditional receiver structures are based on mixing the incoming signal with a pulse template, preferably the same as the received pulse. The top level schematic of the receiver topology is shown in figure 2.2. The front end consists of a filter (not shown) and an LNA. The output of the amplifier and the pulse template is then mixed, using for instance a gilbert multiplier. The output of the mixed signal is connected to a baseband signal processing device through an A/D converter, where symbol detection is obtained through correlation with a symbol template. In an ideal environment this is the optimal receiver topology [Oppe 04].

Generating the optimal template signal is however not trivial, and would lead to excessive power dissipation. Instead, a rectangular shaped pulse is often preferred as the template as it is easier to create, and yield an acceptable receiver efficiency [Siwi 04]. The biggest challenge in the mixing receiver is however to align the template accurately with the incoming signal. This requires synchronization between the devices, a requirement that can be hard to fulfill in a WSN node.

According to the Nyquist sampling theorem, the mixed signal has to be sampled with a frequency equal to or larger than twice the input frequency if sufficient information is to be passed on through to the baseband processing unit. Sampling with frequencies close to or higher than 20 GHz using traditional A/D converter topologies is not feasible in a WSN node. An alternative approach is the so called RAKE receiver, where symbol correlation is achieved without the need for high speed sampling frequencies [Limb 05].



Figure 2.2: Top level schematic of template mixing receiver

#### Thresholded receiver

To work around some of the issues of the mixing structure, the simpler thresholding structure shown in figure 2.3 was proposed in [Meis 05]. The main difference is that no pulse template or synchronization is needed. By continuously thresholding the incoming signal the output will be a quantized version of the input waveform. The receiver topology consist of a dual slope detection scheme, where the pulse shape is detected by thresholding both the positive and negative parts of the pulse. The output is a sequence of square pulses with a time relation equal to the time relation of the upper and lower slopes of the pulse shape. For symbol detection, the output of the pulse detector can be connected to a rake receiver.

In this thesis we will rely on a simplified version of the thresholded receiver, where only the upper slope of the input pulse shape is thresholded. This is done partially to simplify the modeling and analysis. The simple structure and architecture of this topology does however also make it a potentially suitable choice for a future implementation. The top level schematic of the considered receiver is shown in figure 2.4.



Figure 2.3: Top level schematic of the thresholded receiver with dualslope pulse detection



Figure 2.4: Top level schematic of the single threshold receiver topology considered in this thesis

### 2.4 Behavior of a quantized signal

The two valued quantized version of the input signal provide a very coarse estimation of the original analog input, and introduce vast amounts of quantization noise. How the quantized signal looks and behaves depends on signal strength, the amount of noise, and the threshold level in the quantizer circuit. For a certain SNR, the threshold level affects the performance of the quantizer, and is an important parameter. In this section we will analyze the quantized signal in terms of threshold level.

#### 2.4.1 The input of the quantizer

The signal at the input of the quantizer is a combination of signal and noise. In real life the noise is a composite signal, consisting of contributions from Multi User Interference (MUI) and noise generated in the receiver such as thermal noise. In addition the signal will consist of multipath components. This is usually not referred to as noise, instead it is modeled as signal contributions according to an appropriate channel impulse response. Multipath and MUI will be treated later in this thesis, and for now we will simply refer to the noise as Additive White Gaussian Noise (AWGN). It is common to use an ideal channel as a first approximation, where the noise is assumed to be white, thermal noise generated in the receiver.

It should be mentioned that there is another effects that is ignored here as well. In traditional narrow band communication, fading is a frequently used term. Two signals that are out of phase could potentially cancel each other out. This can happen either as a consequence of multipath or as interference between multiple devices communicating on the same channel. However, because of the short timespan of UWB pulses this effect is usually ignored, and will not be considered in this thesis.

#### Modeling the noise

The receiver noise is assumed to be Gaussian noise. The noise n(t) can then be assumed to be a random variable with standard deviation  $\sigma_n$  and mean  $\mu_n$ . In the case of AWGN,  $\mu_n = 0$  and  $\sigma_n = V_n$  where  $V_n$  is the rms voltage of the noise. Probability distributions are usually described with a Probability Density Function (PDF) and/or a Cumulative Distribution Function (CDF). Let us define a function  $CDF(x, \sigma_n, \mu_N)$  which is given by

$$P(n(t) \le x) \tag{2.1}$$

A plot of the PDF and CDF of a white Gaussian noise signal, with  $\mu_N = 0$  and  $\sigma_n = 1$ , is shown in figure 2.5.

#### Modeling the received signal

The continuous time signal seen at the receiver can be represented in the following way in the case of an ideal channel

$$r_i(t) = \alpha_d s(t - \tau_d) + n(t) \tag{2.2}$$

where  $\alpha_d$  is the received amplitude, s(t) is the direct path signal waveform,  $\tau_d$  is the delay from transmitter to receiver, and n(t) is the received noise. In the case of a multipath channel, the signal can be represented by

$$r_m(t) = \alpha_d s(t - \tau_d) + \sum_{n=1}^l \alpha_n s(t - \tau_n) + n(t)$$
 (2.3)

where *l* is the number of paths.

Proper modeling of a received multipath signal is a task of estimating the number of paths and their amplitudes and arrival times. Measurements providing the needed parameters can be found in [Moli 04].



Figure 2.5: Plot of the PDF (a) and CDF (b) of the received noise with  $\sigma_n = 1$ 

#### Modeling the Signal-to-Noise ratio as a function of distance

The SNR is mainly affected by three factors (not considering the noise figure of the receiver):

- The transmitted power  $P_{tx}$
- The distance *D* between sender and receiver
- The noise at the receiver *P<sub>n</sub>*

The transmitted power is constrained by regulations that limits the maximum allowed transmit power, depending on whether the device is intended for indoor or outdoor use. In the U.S. the FCC has set a power limit of -41.3 dBm EIRP. Over a bandwidth of 7.5 GHz (3.1 - 10.6 GHz) this translates into -2.8 dBm or equivalently 0.55 mW [Bene 04].

The transmitted signal suffers from a pathloss, which follows a squared relation to the distance *D* the signal has to travel. Traditionally the frequency dependent attenuation over a multipath free channel is calculated using Friis formula [Frii 46] which is given by

$$A_{FS}(f) = \frac{(4\pi)^2 D^2 f^2}{G_T G_R c^2}$$
(2.4)

#### 2.4. BEHAVIOR OF A QUANTIZED SIGNAL

where  $G_T$  and  $G_R$  is the antenna gain of the transmitter and receiver respectively and *c* is the speed of light. In a multipath channel the pathloss is slightly different. A general formula for the channel gain  $\alpha$  is

$$\alpha(D) = \frac{c_0}{\sqrt{D^{\gamma}}} \tag{2.5}$$

where  $\gamma$  is the pathloss exponent and  $c_0$  is a constant that can be tuned according to a reference pathloss  $PL_0$  at distance  $D_0 = 1$  m. In the case of an ideal channel,  $\gamma = 2$  and  $c_0 = \frac{c^2}{(4\pi f)^2}$  if the antennas are assumed to be ideal isotropic antennas with a gain of 1. For a typical multipath NLOS channel the  $\gamma$  value is higher than 2, while a typical LOS channel might have a  $\gamma$  value lower than 2. Pathloss parameters for different multipath channel scenarios can be found in [Moli 04]. As an example, a residential NLOS  $\gamma = 4.58$  while a LOS  $\gamma = 1.79$ .

The received signal power is found from

$$P_r(D) = \alpha(D)^2 \cdot P_{tx} \tag{2.6}$$

The noise at the receiver is as already mentioned white Gaussian noise. It is assumed here that the available noise power mainly resides from thermal noise in the receiver. This noise is often referred to as the noise floor, and can be calculated from the following formula:

$$P_n = -174 \, \mathrm{dBm} + 10 \log(BW) \tag{2.7}$$

Using a noise bandwidth of 10 GHz give a  $P_n = -74$  dBm = 39.8 pW. In a 50  $\Omega$  this equals  $\sqrt{50 \Omega \cdot 39.8 \text{ pW}} = 44.6 \mu \text{V}_{rms}$ 

The SNR can then be found from

$$SNR = 10log\left(\frac{P_r(D)}{P_n}\right)$$
 (2.8)

#### Peak SNR

SNR as described in section 2.4.1 refers to the average signal power versus noise power. However, when the signal is thresholded the regular SNR term is no longer valid, as it is the instantaneous amplitude, and not the

average signal power, that is thresholded. In [Lee 02] the term peak SNR is defined as

$$SNR_p = \frac{V_{peak}^2}{\sigma_n^2} \tag{2.9}$$

Normalizing the signal to its peak strength, it simplifies to

$$SNR_p = \frac{1}{\sigma_n^2} \tag{2.10}$$

Because noise and signal levels, and thus the SNR, can vary from implementation to implementation,  $SNR_p$  is a useful expression when performing system level performance analysis and will be used extensively throughout this thesis.

#### 2.4.2 Thresholding

After the signal has passed the quantizer, where the signal is thresholded, the previous signal definition no longer applies. The quantized signal is represented by

$$R(t) = \begin{cases} 1 & r(t) \ge \theta_t \\ 0 & r(t) < \theta_t \end{cases}$$
(2.11)

where  $\theta_t$  is the threshold of the quantizer circuit. There are two possible sources for error in the quantized signal, namely false alarm and missed detection. A false alarm occurs if the input cross the threshold when there is no signal present, and a missed detection occurs when the threshold is not crossed even though there is a signal present.

#### Probability of false alarm

This is the probability that, at any time, the input of the quantizer is above the threshold in the presence of white gaussian noise. It can be expressed by

$$P_{fa} = P(n(t) \ge \theta_t) \tag{2.12}$$

It can be rewritten to

$$P_{fa} = P(w(t) \ge \hat{\theta}_t) = 1 - cdf(\hat{\theta}, \sigma_n, 0)$$
(2.13)

where

$$w(t) = \frac{n(t)}{\sigma_n} \tag{2.14}$$

and

$$\hat{\theta} = \frac{\theta}{\sigma_n} = \theta \cdot \sqrt{SNR_p}$$
(2.15)

In figure 2.6 the PDF of the input signal together with the threshold is shown, with  $\hat{\theta} = 1$ . The shaded area represents the  $P_{fa}$ . A plot of  $P_{fa}$  versus the normalized threshold  $\hat{\theta}$  is shown in figure 2.7.



Figure 2.6: Shows the PDF of a noisy input in the absence of a signal with  $\sigma_n = 1$  and  $\hat{\theta}_t = 1$ . The colored area represents the  $P_{fa}$ 

#### Probabitity of missed detect

The probability of the quantizer not detecting an incoming signal is

$$P_{md} = cdf(\hat{\theta}, \sigma_n, Stx \cdot \alpha) \tag{2.16}$$

This is illustrated in figure 2.8. As the SNR increases the PDF is pushed to the right. The  $P_{md}$  is the area of the PDF to the left of the threshold.

In figure 2.9 the  $P_{md}$  is plotted against the threshold for different SNRs.



Figure 2.7: Shows how  $P_{fa}$  decrease for increasing values of  $\hat{\theta}$ 



Figure 2.8: Shows PDFs of input signals with different SNRs. The area to the left of the threshold is the probability of the quantizer missing an incoming pulse



Figure 2.9:  $P_{md}$  versus threshold for different SNRs, with  $\hat{\theta}$  along the x-axis

#### Setting the threshold

From the results provided in the two previous section, two obvious conclusions can be drawn, namely that increasing the threshold increase the chance of missing a pulse and decrease the chance of false pulse detection. In figure 2.10 the two probabilities are plotted against one another for different SNRs. It is clear that there exist an optimal threshold value for every SNR.

#### Sweeping the threshold

As long as the range and thus SNR is unknown, setting the threshold to an optimal level prior to the reception of a pulse is not possible. Instead of setting the threshold to a preprogrammed level, the threshold can be swept over a range of interest. As we will see in the next section, this can be utilized in cooperation with an appropriate sampling scheme.

## 2.5 Samplers

If the CTQA signal is to be useful to a digital domain processing unit, it needs to be converted from the continuous domain. In this section several



Figure 2.10:  $P_{fa}$  versus  $P_{md}$ . Each SNR has its optimal threshold setting, with the lowest total error rate

samplers exploiting the properties of a CTQA signal will be described. By sampler, we will understand a device storing the CTQA signal at a discrete point in time.

#### 2.5.1 Strobed sampler

The strobed sampler stores the quantized signal at a time  $t_s$  after a certain event has occurred. It is useful if monitoring an incoming signal where an event is expected to occur at a given time. In figure 2.11 the strobed sampler is implemented with a simple D-Flip-Flop as the sampler.

The strobed sampler together with the quantizer is actually a kind of single bit A/D converter. The reason for stressing the term CTQA and the strobed sampling technique instead of just staying with traditional data converter terminology is the fact that the sampling time can be realized independent of any clock domain. The delay element can for instance be realized with two inverters that in turn can be scaled to in theory any delay above the minimum gate delay. Later in this thesis a tunable delay element will be presented, where the delay can be tuned with high precision *after production* by applying a bias to the body of the transistors.



Figure 2.11: Implementation of strobed sampler. The sampler is implemented as a D-Flip-Flop

#### **Delay line sampler**

Just observing one discrete point in time might not look like a very efficient way of evaluating the input signal. By repeating the sampling with  $t_s$  spacing between each sample, we get a sequence of samples representing the input. The sampling frequency  $F_s$  of the sequence is given by  $1/t_s$ . Assuming an inverter gate delay of 20 ps, which is actually conservative compared to what is achievable in modern CMOS technology, we see that the sampling frequency of the sampler is 1/40 ps = 25 GHz. Achieving such sampling rates with a traditional ADC would not be compatible with the constraints set by a WSN node.

The delay line sampler does face a major issue, namely that it only provides a finite length portion of the incoming signal, and not a continuous stream of digital values like regular ADCs do. This means that the delay line sampler is best suited for applications where a certain event, such as the arrival of a pulse, is expected to occur within a certain time frame. In a time-based ranging system, such as the one proposed later in this thesis, it is a good candidate. A simple version of the delay line sampler is shown



Figure 2.12: The delay line sampler

in figure 2.12. After the signal has been sampled by the delay line sampler, the signal can be read out from the samplers at an appropriate clock speed



Figure 2.13: Shows the integrating sampler. The counters are one way counters with a clock enable input

and further processed by standard digital circuitry.

#### 2.5.2 Integrating sampler

As shown in section 2.4.2, thresholding might lead to what we have called false alarms in the presense of noise. Setting the threshold at a low level increase the false alarm probability, while it decrese the missed detection probability. When the sequence of samples gathered by the delay line sampler is evaluated in the presence of a signal, it might contain false threshold crossings in addition to the deterministic threshold crossings caused by the signal. The false crossings are however spread randomly out over the entire delay line. If the measurement is repeated, and the value of the new measurement is added to the previous value, the false threshold crossings will still be spread out over the entire delay line. The signal will however appear in the same spot as in the first run, as long as the target signal has some sort of periodic behavior and the initial conditions remain the same between each run.

An implementation of the integrating sampler is shown in figure 2.13. It is similar to the delay line sampler, but the single flip-flop is exchanged with a one way counter, increasing its value if the input is high at the rising edge of the strobe. This is equivalent to a standard counter with a clock enable input.

By repeating the measurement enough times, even signals strengths below the threshold can be recovered since noise is added to the signal and helps the signal cross the threshold. The principle of letting noise help a weak periodic signal through a nonlinear system such as a quantizer is known as Stochastic Resonance (SR). Weak is here understood as referenced to a scale of some sort, which in the case of a quantizer is the threshold. SR requires

#### 2.5. SAMPLERS

a certain amount of noise, and it requires the signal to be smaller than the threshold level.

Similar to SR is Suprathreshold Stochastic Resonance (SSR) which is introduced by Stocks in [Stoc 00]. It describes the case where the signal is not weak but SR like behavior still can be observed. It is shown in [Stoc 00] that an integrating sampler with a permanent threshold follow a binomial distribution, that is, the probability of a specific value n to occur in the sampler for a given input signal x is given by

$$P(n|x) = C_n^N \hat{P}^n (1 - \hat{P})^{N-n}$$
(2.17)

where  $\hat{P}$  is the probability of a threshold crossing for a given input *x* and *N* is the number of integrations.

To further increase the throughput of the system the threshold can be swept over a range of interest. This is equivalent to the multilevel threshold system discussed in [Stoc 00], where it is concluded that at least for lower noise levels a multilevel threshold system is more efficient in terms of information throughput, than a permanent threshold system.

In [Hjor 06] simulations are performed on a swept threshold sampler and a permanent threshold sampler. Their performance is analyzed in terms of noise in recovered signal instead of information throughput, and they are compared to an ideal analog average sampler, which is a linear sampler storing the exact value of the input signal between each run. As expected it is shown that for low noise levels the swept threshold sampler outperforms the permanent threshold sampler. In addition, it is shown that as the noise approach the same level as the signal both samplers perform almost as good as the analog average sampler. From the results, we can also see how the performance of swept and permanent threshold samplers converge for higher noise levels, as noted by Stocks.

## Chapter 3

# Ranging

Measuring the distance between two remote nodes using nothing but RF signals is a challenging problem. In this chapter a method for solving this problem will be proposed.

### 3.1 Chapter overview

The first part of this chapter is devoted to some existing ranging techniques. The two major branches of wireless ranging, namely time based and signal strength based approaches, will be discussed briefly. In the second part the proposed ranging scheme is presented.

### **3.2** Performance quantities

The performance of a ranging technique can be measured from different quantities. Two frequently used terms are accuracy and precision. They are however often used with different meaning by different authors, so before the different ranging techniques are presented some definitions are in order:

**Accuracy** is the resolution, or grain size achieved by the ranging procedure. A location system split the target area into discrete spatial steps, and the size of these steps are referred to as the grain size. Lower grain size leads to higher resolution, and a location system with a given resolution can only estimate the distance to a target to the integer N times the grain size which is closest to the actual distance. **Precision** is the probability that the estimated distance is correct. Say that a location system with a grain size of 1 meter estimates the distance to a target to be 10 meters. If the system has a precision of 99 percent, the system will estimate this distance 99 percent of the time.

When considering a ranging system for a specific application it is important that both of these factors are evaluated. In some applications accuracy might be traded in for increased precision and vice versa.

Together, the performance quantities create an error in the estimated distance, defined as

$$\varepsilon_d = |\hat{d} - d| \tag{3.1}$$

where *d* is the actual distance and  $\hat{d}$  is the estimated distance.

## 3.3 Ranging techniques for WSN

Most ranging schemes are based on measuring one of two quantities:

- Received Signal Strength (RSS)
- Time of Flight (ToF)

In addition some schemes are based on measuring the angle of an incoming signal. This is called Angle Of Arrival (AoA). If AoA information from three or more nodes is compared, their position can be estimated. Because AoA requires either two separate antennas, or special antenna design, it is not feasible to implement it in WSN nodes. It will therefore not be discussed further here.

#### 3.3.1 Received Signal Strength

As the name indicates, RSS is based on measuring the strength of the received signal. If the strength of the transmitted signal is known, the distance can be estimated by relying on a model of the expected path loss. It requires detailed a priori knowledge about the characteristics of the channel, and is sensitive to variations in these parameters. The number of paths and the NLOS path loss for a typical office environment will not be the same as in for instance an outdoor environment. For these reasons, the RSS device has to be calibrated for a specific application.

RSS typically delivers lower spatial resolution and accuracy than time based approaches. Because it is relatively simple to implement, it is sometimes

24

used as a coarse first step in acquiring the range, and a Time of Arrival (ToA) technique is used to achieve higher resolution [Gezi 05].

Range estimation based on RSS can often be performed on already existing signals, meaning no extra signals or radio front ends has to be created for ranging purposes. In traditional narrow band RF applications, such as WLAN, RSS is often implemented to indicate the quality of the received signal. With minimal additional hardware or software it can be used in range estimation.

In a WSN using UWB communication, RSS delivers little or no extra ranging capabilities over narrow band communication. Measuring the signal level of a pulse is more complicated than measuring the level of a carrier wave, due to the ultra short duration of the UWB pulses.

An example of a ranging system using RSS is the Distributed Radiolocation Hardware Core by Motorola [Taub 05]. It delivers an accuracy of 3 m and a precision of 0.5 m. Note that this system is based on narrow band signals.

#### 3.3.2 Time of Flight

Time of Flight (ToF) based ranging schemes are based on measuring the propagation delay of a signal between two or more nodes. A fundamental part of any ToF based approach is determining the ToA of the incoming signal. If the two nodes have a common and synchronized clock, the transmitter can attach a time stamp on the transmitted signal. The receiver can then derive the ToF by subtracting the transmission time from the ToA. The distance separating the two nodes can then be found by solving

$$d = c \cdot t_{ToF} \tag{3.2}$$

where *c* is the speed of light, usually approximated by the speed in vacuum which is close to  $3 \cdot 10^8$  m/s. A challenge faced by most traditional ToA schemes is that clock jitter become a major issue. Perfect synchronized clocks are hard to realize in practice, and high precision clocks come with a certain cost. In a small WSN node where price and size are key properties, such clocks are not viable options. A solution might be to synchronize the clocks frequently. In a network consisting of many nodes, maybe in the order of thousands, this requires a central node with a precise clock for the surrounding nodes to synchronize with. In ad-hoc networks such as WSN, each node does not necessarily have a direct line of communication to the central node, meaning the information has to be passed on from node to node. In addition to adding an uncertainty to the synchronization signal, it consumes potentially large amounts of the available bandwidth and power from the individual nodes.

#### **Time Difference of Arrival**

Time Difference Of Arrival (TDoA) is another approach to time based ranging. The major difference from traditional ToA techniques is that no clock synchronization is needed. Instead it measures time difference between two signals. As will be shown in chapter 5, measuring time differences can be done with high precision and accuracy without the need of a high precision clock.

One way of realizing a TDoA ranging scheme is to measure the ToA of two signals traveling from two reference nodes. This does however require the two reference nodes to be synchronized.

Another approach to the TDoA technique is to measure the time a signal use to travel back and forth between two nodes. This approach was first suggested by Scholtz et al. in [Lee 02]. The scheme proposed in this thesis is based on this approach.

## 3.4 Proposed ranging scheme

In this section a new ranging scheme is proposed. It is compatible with the goals set in the introduction of this thesis by being implementable in standard CMOS technology and offering high spatial resolution. Power consumption is also expected to be at an acceptable level. The scheme is intended to be implemented in a sensor node as a separate piece of hardware, next to already existing radio hardware. To minimize the extra cost introduced by this hardware, the size and power consumption need to be kept at a minimum.

We will start by presenting a brief overview of the concept, which we have given the name *Active Echo*.

#### 3.4.1 System Concept

To ease the process of describing and evaluating of the proposed scheme, the considered network will be assumed to consist of a peer-to-peer connection between two nodes. They are prearranged in a hierarchical structure, consisting of a what we have called a *main device* and an *echo device*. The main device is the device trying to measure the separation distance.

First, consider the sketch in figure 3.1. It shows the principle of a standard radar device, where the distance to a remote object is estimated by measuring the propagation time of a pulse traveling to the object and back again.

26
#### 3.4. PROPOSED RANGING SCHEME

This object is passive, in the sense that the returning pulse, called the echo, is a reflection of the transmitted pulse.



Figure 3.1: Concept sketch of a radar system

Now, exchange the passive object in figure 3.1 with an echo device. The result is the sketch in figure 3.2. The main device emit a request pulse, and the echo device answer by emitting an echo pulse, hence the name Active Echo. The echo device use a certain time to detect and respond to the received request, and this delay is called the turnaround time  $\Delta t$ . The main device measure the time  $t_{meas}$  from the request pulse is emitted to the echo pulse is received. The estimated distance  $\hat{d}$  can be found from the following formula

$$\hat{d} = v\hat{t} = v \cdot \left[\frac{t_{meas} - \Delta t}{2}\right]$$
(3.3)

where v is the speed of the traveling signal, and  $\hat{t}$  is the estimated propagation delay  $t_{prop}$  between the two nodes. In a realistic multipath affected channel, it is important that  $\hat{t}$  is estimated from the direct path of the signal. Estimating  $\hat{t}$  from a multipath component adds a positive error to the result.



Figure 3.2: System concept of the Active Echo scheme

## 3.5 Device design

A WSN usually consist of multiple instances of similar or equal nodes. A sensor node utilizing Active Echo for ranging therefore has to incorporate both device types, being able to act as either main or echo device depending on the situation.



Figure 3.3: Implementing the two devices as a part of the PHY-Layer of a WSN node. The front end consist of filter and amplifier functions. A quantizer and pulse generator can also be shared as part of the front end

Figure 3.3 show a sketch of how the physical (PHY) layer of an Active Echo capable sensor node might look. The front end contains basic analog parts such as an LNA and filter functions. Then the amplified and filtered signal is passed through a multiplexer. During normal operation the signal is simply passed on to the PHY layer hardware. If the node is suppose to act as an echo or main device the received signals are passed directly to the corresponding Active Echo hardware part.

28

#### 3.5. DEVICE DESIGN

#### 3.5.1 Echo device

The behavior of the echo device is fundamental to the performance of the entire system. If the echo device misses an incoming pulse, the entire ranging fails. Making the echo device too sensitive is also problematic since it makes it difficult for the device to separate the request from noise.



Figure 3.4: Top level implementation of echo device. The flip-flop and delay element are standard digital building blocks, and the front end (not shown), quantizer and pulse generator can be shared with other parts of the node.

A top level sketch of how the echo device might look is shown in figure 3.4. In addition to the LNA and filter functions (not shown), the echo device consist of a quantizer, a flip-flop, a delay element and a pulse generator. The quantizer and the pulse generator can be shared with the two other hardware parts of the WSN node. The delay element represents the turnaround time of the device, which actually also serve as a constructive part of the whole scheme in the case of multipath. This will be treated later

The flip-flop is included to ensure that the device only responds to the first detected threshold crossing. If the measurement is to be repeated, the flip-flop has to be reset before the circuit is ready again.

The device is vulnerable to false alarms, in the sense that a false alarm in the echo device leads to a missed detection of the real request pulse, and adds an error to the estimated distance. Increasing the threshold will decrease the  $P_{fa}$ , but will also increase the  $P_{md}$ . The setting of the threshold in the echo device is therefore of great importance, and affects the performance of the whole scheme.

Defining an early false pulse detection (false alarm before the arrival of a pulse) as a failure, the probability of such a failure to occur increase with time. If the device is turned on at time t = 0, the probability that a failure has occurred at time t = T is given by [Rice 66]

$$P_{failure} = \int_0^T P(n(t) \ge \theta_t) dt = 1 - e^{-T/\lambda}$$
(3.4)

where  $\lambda$  is the average time between successive downward and upward crossings of n(t) for a given level of  $\theta_t$ .

We see from equation (3.4) that the only way of lowering the failure probability is to raise the threshold. As long as the threshold in the echo device is significantly higher than  $\sigma_n$ , the failure probability can be kept at an acceptable level. Setting the threshold too high on the other hand limit the range of operation.

#### The function of $\Delta t$

Normally, delays are undesired and parasitic elements which we would try to avoid. In this particular application, the delay is the result of the delay introduced by the quantizer and transmitter in the echo device. To a first level approximation, this delay is a constant, and can simply be subtracted from the measured propagation delay.

There is however also another aspect of  $\Delta t$ , where it can be turned into a useful and constructive part of the ranging. As previously explained, radars work by detecting backscatter, i.e. pulses bouncing off nearby objects. When the main device emit a request, it will receive such backscatter, together with the echo pulse. If the delay is set long enough, the backscatter will have died before the echo pulse arrives, increasing the probability of estimating the correct arrival time. The same delay can easily be implemented in the main device to compensate for the nonlinearity introduced by  $\Delta t$ . Implementing the delay using delay elements of the same kind in both devices make sure that deviations in delay between the two devices caused by matching errors in the produced chips are minimized.

## 3.5.2 Main device

The purpose of the main device is to measure the time difference between the transmission time of the request pulse and the arrival time of the echo pulse. A top level implementation is shown in figure 3.5. The ranging is initiated by the control logic pulling the input to the pulse generator high, making it transmit the request. The TDMC is shown as a delay line sampler where the input to the pulse generator also serve as the strobe signal initiating the sampled sequence. A delay element is also included, which, as explained in the previous section, compensate for the turnaround time in the echo device.

## 3.6 Benefits from Active Echo

The idea of measuring range without synchronization was first introduced in the Two-Way Ranging Scheme by Scholtz et al. [Lee 02], where range is



Figure 3.5: Top-level implementation of main device. The TDMC is shown as a simple delay line sampler, but could also be implemented as an integrating sampler or as the body biased TDMC described in a later chapter.

estimated by measuring the propagation delay of a signal traveling back and forth between two peer nodes. It share similarities with the scheme presented in this thesis, since both schemes obtain high temporal and spatial resolution without the need for synchronization.

The Two-Way Ranging Scheme relies on a complex receiver system in both the main node and what we have called the echo node, shown in figure 3.6. It works by first synchronizing with the incoming signal using template mixing and A/D conversion. Once synchronization is achieved, the parallel sampler sample the incoming signal with small offsets between each sampler and this way the effective sampling rate is increased. The echo device return the request after a specified delay. The parallel samplers are in turn used to resolve errors due to multipath, and any offset introduced by this is informed to the main device.

In Active Echo, the echo device operates only in the continuous time domain without any correlation, clocks or time and power consuming signal processing. The extra hardware needed in the echo device is minimal, if a front end with the necessary amplifier and filter functions is assumed to be present in the existing radio part. The main device requires a bit more complex hardware. Compared to the receiver topology used in the Two-Way ranging scheme though, the main device topology is obviously simpler.

In figure 3.7 a top level overview of the different parts of the Active Echo hardware is shown together with their respective time domains. Notice that the samplers are placed in a separate time domain called sequenced



Figure 3.6: Receiver schematic of Two-Way Ranging scheme. The figure is copied from [Lee 02] and included to illustrate the complexity level of the receiver.

time. This is to illustrate that the samples are taken in sequence referred to the emission time of the request pulse, and not the clock of the device. It can be argued that the emission time of the pulse probably is correlated to the main device clock, which is true. The delay from emission time to the sampling sequence is initiated is however realized with delay elements rather than a clock signal to remove the need for high speed and high precision clocks. The resolution of the sequenced time domain is only limited by the delay through the delay elements, and as pointed out in section 2.5.1, 25 GHz is not an unrealistic figure. A CMOS chip with a clock running at such rates is hardly realistic, and even if it could be done, it would be incompatible with the constrains of a WSN node. The samplers are therefore put in a separate time domain to underline the fact that the start time of the sampled sequence is not necessarily synchronous with the clock domain of the device.

## 3.7 Summary

The proposed ranging scheme solves the ranging problem by implementing a dedicated hardware solution and CTQA signal processing. Two separate devices are needed; one very simple echo device, and a slightly more



Figure 3.7: The different parts of the Active Echo hardware and their respective time domains

complicated main device. Both are based on basic digital components such as inverters and flip-flops, making them implementable in commercial and cheap processes such as CMOS. This also means low power consumption and small size, making it compatible with the target specifications. 34

## Chapter 4

# **Active Echo architecture**

In the previous chapter an overview of the proposed ranging scheme, Active Echo, was presented. The goal of this chapter is to investigate the scheme as a concept and explore different sampling strategies in the main device.

## 4.1 Chapter overview

Instead of beginning the analysis by diving deep into details, it might be a good idea to take one step back to get at better grasp of the challenges lying ahead. The complete system consist of a continuous signal path, from the main device, through the echo device and back to the main device again. On this path, the traveling signal face two major obstacles. First the request pulse has to trigger the echo device, making it emit an echo pulse. The returning echo then has to be received and evaluated properly by the main device. Based on the arrival time of the returning echo pulse the main device has to estimate the distance separating the two devices.

The first part of the chapter provides an example of a typical ideal channel. It is included to put some of the values and assumptions used throughout the rest of the chapter in a context.

In the following sections the performance of the two devices will be discussed. In the end of the chapter, non-ideal channel conditions will be briefly discussed.

## 4.2 Simulation of an AWGN channel

The simulation is carried out with the aid of the MATLAB functions presented by Benedetto et al in [Bene 04]. The considered waveform is a second derivate Gaussian pulse, generated using the function cp0201\_waveform, with length 500 ps and shape factor 250 ps. This is the pulse waveform seen at the output of the antenna at the receiver, when a first derivate Gaussian pulse, often referred to as a monocycle , is generated by the transmitter and sent to the transmitting antenna.[Sche 00]

Propagation over an ideal channel is considered. This means that the channel is assumed to be multipath free. The reference pathloss  $PL_0$  is set to 40 dB and the pathloss exponent  $\gamma$  is 2. To estimate the  $SNR_p$  of the received signal at distance D we need to calculate the received signal strength and estimate the noise level.

The first step is to generate the received waveform. The amplitude of the waveform is limited by the target maximum average emitted power and the Pulse Repetition Frequency (PRF). The average emitted power is set to -30 dBm, which is taken from examples in [Bene 04]. PRF is set to 1 MHz. A plot of the resulting waveform  $S_{TX}$  can be found in figure 4.1.

The next step is to estimate the  $SNR_{p0}$ , which is the  $SNR_p$  with 1 m separation between the nodes. The signal has been attenuated by the channel gain, which is found to be

$$c_0 = 10^{(-40/20)} \tag{4.1}$$

$$\alpha(1) = \frac{c_0}{D} = 0.1 \tag{4.2}$$

The received signal can then be found by

$$S_{RX1} = \alpha \cdot S_{TX} \tag{4.3}$$

The noise is assumed to be thermal noise so  $\sigma_n = 44.6 \,\mu\text{V}$ . A plot of the received and noisy signal at a distances of 1-20 meters is shown in figure 4.1. The  $SNR_{p0}$  can be found from

$$SNR_{p0} = 10 \log\left(\frac{\max(V_{RX1})^2}{\sigma_n^2}\right)$$
(4.4)

which in this special case is approximately 27 dB.

The  $SNR_p$  is related to a distance *D* through

$$SNR_p = SNR_{p0} + 20\log\alpha + PL_0 \tag{4.5}$$

where  $\alpha$  is dependent on *D* through equation (2.5)



Figure 4.1: Transmitted waveform and example waveforms seen by receivers at 1-20 m. Note that the vertical scale is different on the individual plots.

Actual noise levels might vary from implementation to implementation, and the pathloss is an application specific variable depending on various conditions such as the multipath density and the type of channel (NLOS or LOS). It is therefore useful to use the  $SNR_p$  as a quantity instead of actual meters, noise and signal levels. It is a generic variable and independent of application specific conditions. The values derived in this section is provided as an example. For other channel conditions and implementations a given  $SNR_p$  will correspond to another separation distance.

## 4.3 Echo device performance

A fundamental challenge in the echo device is to make it sensitive enough to detect an incoming request within the range of interest, and robust enough to handle noise. The considered system only consist of a quantizer, a delay element and a pulse generator. The sensitivity and robustness is therefore decided by the threshold in the quantizer.

As shown in section 2.4.2 the quantized signal is vulnerable to two kinds of errors, namely false alarms and missed detections. Early false alarms is a fundamental limiter to the complete scheme, and should be kept at a minimum. The trick is to set the threshold to a level yielding a low  $P_{failure}$ , while keeping  $P_{md}$  at an acceptable level. To investigate  $P_{failure}$  we use equation 3.4. T is swept for ranges of interest. It is assumed that t = 0 is equal to a separation distance of 0 m, and t = T equals (c · T) m where *c* is the speed of light in vacuum.  $\lambda$  is harder to approximate. In [Lee 02] it is approximated by computer simulation of an AWGN vector. We rely in the values derived there as an approximation to the noise. By sweeping the the distance between the devices and



Figure 4.2: Sweep of threshold and time

the threshold  $\hat{\theta}_t$  we get the plot in figure 4.2. As expected, the probability approach 1 when the threshold is lowered. Assume now that a failure rate of below 1% is desired. The question is, what is the lowest theshold setting that can be used to achieve such a goal? In figure 4.3 a zoomed version of 4.2 is shown. Setting  $\hat{\theta}_t$  to 2, keeps us below 1% for ranges at least up to 20 meters.

Unfortunately, the noise is not the only limiting factor on range. Once the amplitude of the signal is lower than the threshold, the missed detection probability  $P_{md}$  increase rapidly since the signal now only cross the threshold when noise added to the signal "helps" the signal across.

Because of the shape of the pulse,  $P_{md}$  varies for the different parts of the pulse. For the pulse shape shown in figure 4.1 the signal will have a higher probability of crossing the threshold near the center of the pulse than in the weaker parts surrounding the center. To simplify the further analysis it is assumed that  $P_{md}$  can be approximated by the peak amplitude of the pulse, using equation (2.16) By sweeping distance and threshold, the plot in figure



Figure 4.3: Failure probability, zoomed



Figure 4.4: Probability of a pulse passing the quantizer without detection as a function of threshold and distance, plotted as actual meters to make the plot more understandable.

4.4 is produced. In figure 4.5 slices of the  $P_{md}$  with  $\hat{\theta}_t = [2, 3, 4]$  is shown. From the two plots we clearly see the effect of the thresholding. As the signal strength decline to below the threshold level,  $P_{md}$  increase rapidly and approach 1. The figures are shown with actual meters instead of  $SNR_p$  to make them easier to understand. Note that different channel conditions will lead to different results in terms of actual meters. The shape of the curve, and the relative effect of changing the threshold still applies though.



Figure 4.5: Probability of the quantizer not detecting a pulse as a function of increased distance with  $\theta_t = [2, 3, 4]$ 

## 4.3.1 Delta error

Because of the shape of the pulse, the quantizer in the echo node add an uncertainty to  $\Delta t$ . This is illustrated in figure 4.6. The signal cross the threshold earlier for the high  $SNR_p$  signal than for the low  $SNR_p$  signal. In reality, the signal might cross the threshold anywhere within the positive part of the pulse. This error is hard to work around, but it can be predicted to a certain extent if some a priori knowledge on the pathloss is present.

In any case, the error is relatively small. In the pulse shape shown in figure 4.1 the positive part of the pulse (the part above DC) last roughly 150 ps. If we can assume that the threshold is crossed somewhere in the first half

of the positive part, the error made is 75 ps at tops. This translates into approximately 22.5 mm of spatial resolution.



Figure 4.6: Shows how the delta error is manifested for high and low  $SNR_{v}s$ . The arrows indicate the time of the threshold crossing.

## 4.4 Sampler strategies in Main Device

In this section different sampler strategies are analyzed, and their potential performance in the Active Echo scheme is evaluated. To simplify the discussions it is assumed that the echo device is ideal in the sense that early false alarm errors not is considered. Missed detections will however be taken into account.

The samplers themselves are not described in detail in this section. More detailed descriptions of the samplers can be found in chapter 2.

#### 4.4.1 Delay line sampler

The output of the delay line sampler is represented by a vector **v**. Each index of **v** represent the output of a sampler in the delay line, where  $\mathbf{v}_0$  is the first and  $\mathbf{v}_L$  is the last sampler of a delay line of length *L*. Each index i of  $\mathbf{v}_i$  can hold a value of either 0 or 1.

The vector **v** hold a sequenced "image" of the state of the quantizer from time  $t = t_{start}$  to  $t = t_{start} + t_s \cdot L$ , where  $t_{start}$  is the strobe signal from the control logic trigging the sampling. The resolution of this image is decided by the delay  $t_s$  between each sample, and is actually the spatial resolution of the ranging.

To estimate the arrival time of the pulse, the sampled signal is analyzed. The easiest way of realizing this is to locate the first threshold crossing by searching  $\mathbf{v}$  for the first 1. Clearly this introduce an extra probability of failure, since noise energy could make the signal cross the threshold in the main device, just as in the echo device. Since the signal now is stored as a

sequence of discrete time samples, we can estimate the probability of a sampler holding a 1 by (2.13). In the echo node, the noise is a continuous signal and the failure probability therefore has to be treated as threshold crossings of a continuous process. However, when the signal has been stored in **v**, each index of **v** can be treated as uncorrelated stochastic variables. The probability of an early false alarm in the main device is the probability that at least one of the samplers  $\mathbf{v}_{0}...\mathbf{v}_{i_{TOA}}$  has the value 1, where  $i_{TOA}$  is the index of the sampler corresponding to the arrival time of the echo pulse, found by  $i_{TOA} = \frac{D/c \cdot 2}{T_s}$  assuming the delay introduced by the slave ( $\Delta t$ ) has been removed by an initial delay. The false alarm probability  $P_{famain}$  can be found through

$$P_{famain} = 1 - (1 - \hat{P}_n)^{i_{TOA} + 1}$$
(4.6)

where  $\hat{P}_n$  is the probability of a threshold crossing in the noise only part of the signal. Increasing the distance result in increased  $P_{famain}$ , since increased distance means increased  $i_{TOA}$ . The input to the quantizer will of course be noisy, meaning the threshold has to be set somewhere above the noise floor. Like in the echo device, the  $P_{md}$  increase rapidly for signal strengths below the threshold.

#### 4.4.2 Integrating sampler

The integrating sampler, as described in section 2.5.2, works by repeating the measurement N times and integrating the result between each run. The threshold in the main node can be set permanently to 0, while the threshold in the echo node is set permanently to an appropriate value, keeping the failure probability below a desired value. The result can now be evaluated by comparing the set of N runs.

Although the missed detection probability increase rapidly as the distance between the two devices is increased and the signal strength decays to levels below the threshold, the probability of a threshold crossing is always higher where there is a pulse present, than in the case of a noise only signal. As long as the distance between the devices has remained the same it can be assumed that the echo pulse appear at the same time for all *N* integrations.

In the delay line sampler, the value of a sample is limited to 0 or 1. In the integrating sampler, the value is an integer between 0 and N. Each sample is a binomially distributed variable, which means that the mean value is given by

$$\mu_{\mathbf{v}_i} = N\hat{P}_i \tag{4.7}$$

and the standard deviation is

$$\sigma_{\mathbf{v}_i} = \sqrt{N\hat{P}_i(1-\hat{P}_i)} \tag{4.8}$$

where  $\hat{P}_i$  is the probability of a threshold crossing in sampler *i*.

In the noise only part of the input (the part before and after a pulse arrives), this probability is  $\hat{P}_n = 0.5$  as long as the threshold is 0 (DC).

For the portion of the input with deterministic threshold crossings caused by a pulse, the probability is a function of signal level, the threshold in the echo device  $\theta_{ED}$  and the noise.

Since the signal is evaluated over the entire timespan of the received pulse in the main device, the missed detection probability  $P_{mdmain}(i)$  is a function of the input pulse shape, meaning the different parts of the pulse have different probabilities of crossing the threshold. If we sample the pulse waveform, it can be estimated by equation (2.16) for each of the samples.

It is now possible to estimate the probability of a threshold crossing in the quantizer. In the presence of a pulse it can be approximated by

$$\hat{P}_s(i) = \left[ (1 - P_{mdecho}) \cdot (1 - P_{mdmain}(i)) \right] + \hat{P}_n \cdot P_{mdecho}$$
(4.9)

where  $P_{mdecho}$  is the missed detection probability in the echo device.

Assume that a pulse arrives at the time  $t = t_a$  and has the duration  $t_m$ . The threshold crossing probability is then

$$P_{tc}(t) = \begin{cases} P_s(t) & t_a > t > t_a + t_m \\ 0.5 & \text{otherwise} \end{cases}$$
(4.10)

It was shown in [Hjor 06] that for noisy signals, the permanent threshold sampler behave almost as good as an analog average sampler. When the signal is sampled *N* times, the noise is therefore reduced by approximately  $\sqrt{N}$ . The output of the integrating sampler can be viewed as a recovered version of the signal at the input of the quantizer. The *SNR*<sub>p</sub> of the recovered signal can be found through

$$SNRp_{recovered} \approx \frac{V_p^2}{(\sigma_n/\sqrt{N})^2} = \frac{N \cdot \hat{P}_s^2}{\hat{P}_n(1-\hat{P}_n)}$$
(4.11)

Note that this only is correct for cases where the noise is close to or higher than the signal. In these cases SSR effects are present, maximizing the amount of signal passed through the quantizer. In a case where the signal is much stronger than the noise the sampler will clip, resulting in lower *SNRp*. When the noise approach zero each sampler will act like a 1 bit ADC, and the integrating sampler act like the delay line sampler [Stoc 00].

To illustrate how the output of the integrating sampler might look, the output of each sampler can be represented by its corresponding PDF. By plotting all PDFs next to each other along the x-axis and coding the different values of the PDF in different shades of gray result in a so-called grey-map. Higher values result in darker shades and lower values result in lighter shades. In figure 4.7 a gray-plot of a section of **v** with  $SNR_p = 7 \text{ dB}(10 \text{ m})$  and  $\theta_{ED} = 2$  is shown. To illustrate the point of the grey-map, the mean value of each sampler is plotted with a solid red line on top of the grey-map. The standard deviation is shown with stapled lines. The number of runs *N* in this particular plot is 100.



Figure 4.7: Example of a gray-map of the sampler states with  $SNR_p = 7 \text{ dB}(10 \text{ m})$ . The mean value for each sampler is shown with a solid red line and the standard deviation is shown with a stapled line.  $\theta_E D$  is set to two.

To see how the number of integrations affect the signal quality, a the recovered version of a signal with  $SNR_p = 3.5 \text{ dB}(15 \text{ m})$  is shown in figure 4.8, with N set to 100, 1000 and 10000. By visually inspecting the plots it is obvious that the increased number of integrations has made the pulse more distinct.

To illustrate how the clipping behavior of the sampler for high  $SNR_p$  will look, a grey-map for an  $SNR_p = 17.5 \text{ dB}(3 \text{ m})$  is shown in figure 4.9. The five samplers closest to the center of the pulse reach their maximum values and consequently clip the input signal.



Figure 4.8: Grey-maps of the sampler states for a received signal with  $SNR_p = 3.5 \text{ dB}(15 \text{ m})$ 



Figure 4.9: Grey-map showing the output of a clipping sampler with input  $SNR_p = 17.5 \text{ dB}(3 \text{ m})$ 



Figure 4.10: Plot taken from [Stoc 00] showing the transmitted information against  $\sigma_n$  for N = 64

#### 4.4.3 Swept threshold

Finding an analytical expression for the swept threshold sampler is tricky. The performance for different noise and signal conditions has been thoroughly analyzed by Stocks in [Stoc 00], and we will rely on the results from his work here.

Consider the plots in figure 4.10, which has been copied from [Stoc 00]. In each plot the threshold is swept over the area  $\pm 1$ , and the input signal is a random and uniformly distributed signal between limits  $\pm L$ . The Y-axis represent the amount of information transmitted through the system in bits. Although this might not be directly relevant to the Active Echo scheme, it reveals important information on how the swept threshold perform compared to the simpler permanent threshold.

Figure 4.10(a) show the case where  $L \le 1$ , meaning the threshold is swept over a *larger* area than the signal level, and figure 4.10(b) show the case where  $L \ge 1$ , meaning the threshold is swept over a *smaller* area than the signal level. The two plots show the importance of appropriate threshold setting. Setting the sweeping area of the threshold too large could in the worst case lead to complete loss of the incoming signal, while for small sweeping areas the performance is similar to the permanent threshold.

An important observation that can be made from figure 4.10(b) is that as  $\sigma_n$  approach 1, the swept threshold deliver no extra performance over the permanent threshold (where  $L \rightarrow \infty$ ). For  $\sigma_n = 0.6$  and above the two schemes perform equally well. In our convention this corresponds to an  $SNR_p$  of approximately 4.4 dB.

## 4.4.4 Adaptive thresholding

In the previous section we saw how threshold sweeping can lead to more accurate signal reconstruction for  $SNR_p$ s higher than approximately 4.4 dB. Initially, it might not be obvious what Active Echo can benefit from this, since it is accurate *time* information and not amplitude information we are after.

There is however an aspect that has yet to be considered. Setting thresholds can be a tricky business, and the echo node in particular relies on accurate threshold setting to obtain optimal performance. The internal noise level will vary between implementations, making it hard to predict. In addition comes noise from external sources. This calls for a more adaptive threshold scheme.

As long as the noise can be assumed to follow a predictable distribution, such as for instance Gaussian noise, the output of a quantizer is predictable for noise only signals. By sweeping the threshold and at the same time monitoring the output of the quantizer by for instance sampling it in a delay line sampler, it should be possible to simply stop the sweep when the output of the quantizer satisfy a certain criteria. An example of such a criteria could be that a desired value is achieved in the delay line, corresponding to the desired probability distribution. For instance if the signal is observed for X occurrences, and the threshold is set somewhere close to  $2 \cdot \sigma_n$ , approximately 3% of the occurrence should be 1, and the rest should be 0.

## 4.5 Estimating the ToF

The Time of Flight between the two nodes is estimated by searching **v** for the correct index  $i_{TOA}$ . The ToF is the difference between  $i_{TOA}$  and the index representing the transmission time of the request pulse multiplied with  $t_s$ . If  $\Delta t$  is compensated for by an initial delay, the ToF can be found through  $\frac{i_{TOA} \cdot t_s}{2}$  since the sampling sequence was initiated at the time of transmission.

There are several possible ways of finding  $i_{TOA}$ . We will discuss two possibilities here, and they are:

- Thresholding
- Max Selection

## 4.5.1 Thresholding

Thresholding means that the content of **v** is compared to a threshold. Comparing **v** to a predefined and permanent threshold is the simplest and least complex algorithm, because it can be completed in linear time. **v** is searched serially until the threshold is crossed, and requires at most *L* operations, where one operation is defined as comparing **v**<sub>*i*</sub> to the threshold and *L* is the length of the delay line. The drawback is that it requires the setting of a threshold which can be non trivial.

To increase the robustness of the threshold crossing algorithm, a threshold crossing can be verified by comparing the value of the chosen sample to its neighbors. By knowing the expected pulse shape and width, the surrounding samples should be within some threshold, either above of below the current sample. It is fair to assume that the first threshold crossing of a pulse happens either at the rising edge, or at the peak of the pulse, so as long as the sampling rate is high enough to capture at least two samples within the positive part of the pulse, this should be detectable. If this verification process does not fulfill a given criteria, the search is continued. This verification adds some complexity to the algorithm.

It is also possible to implement an adaptive threshold scheme. It has a higher complexity than a permanent threshold scheme, but is more robust and flexible. The difference between a permanent threshold and an adaptive threshold scheme, is that the adaptive threshold scheme contain some sort of feedback, adjusting the threshold according the output of the decision logic. The threshold can be adjusted linearly, like in a swept threshold sampler, starting either with a high or low threshold, and then adjust the threshold up or down until the output conform to a given criteria. The flexibility of an adaptive threshold scheme make it particularly fitting for a network with high mobility. Continuously adapting the threshold to the state of the samplers could compensate for varying distances between the nodes.

#### 4.5.2 Max selection

Instead of comparing  $\mathbf{v}$  with a predefined threshold, the algorithm can select the sampler with the highest value. This requires the algorithm to search through the entire  $\mathbf{v}$ . It is simple to implement and complete in linear time, just like the thresholding scheme. In contrast to the thresholding scheme though, no threshold has to be set prior to the ranging. It can also be extended with verification similar to the thresholding algorithm, examining the neighbors to see if they follow some predefined criteria.

If the input signal is noisy, and a sufficiently high number of integrations is used in the sampler, the maximum of recovered signal should be the center of the returned pulse. Knowing this, it should ideally be possible to obtain a precision of  $\pm 0.5 \cdot t_s$  for an accuracy of  $t_s$ , since the worst case sampling time is when the peak of the pulse appear exactly in the middle of two samples. When the input signal has a high  $SNR_p$ , the recovered signal might clip resulting in multiple samplers reaching maximum. It is possible to compensate for this by for instance choosing the middle of the clipping samplers.

#### 4.5.3 Simulated performance

The simplicity of the Max selection algorithm make it a good candidate for the Active Echo scheme. To investigate how it performs under various noise conditions, a computer simulation is performed for different values of N. Note that this simulation only show how the algorithm perform within a single pulse, and not for an entire delay line. However, as long as the  $SNR_p$  of the recovered signal is good enough this provides a good illustration of the overall performance.

The simulation is based on a permanent threshold sampler with  $t_s = 20$  ps and  $\theta_{ED} = 2$ . Each simulation is averaged over 500 channel realizations, and the RMS error is calculated by

$$\sigma_t = \sqrt{\left(\sum_{r=1}^{500} (\hat{t}_r - t)^2\right) / (500)}$$
(4.12)

where  $\hat{t}_r$  is the estimated ToF for each channel realization r. A plot of the simulation results can be found in figure 4.11.

Note that just observing the plot alone might lead to the assumption that an N of only 10 provide of only 150 ps of RMS error for signals with negative *SNR*. It is however important to appreciate the fact that the simulation was performed for a pulse only part of a input signal. As the error begin to climb, there is an increasing chance that a sampler in the noise only part of the input is chosen by the Max selection algorithm. The plot does however illustrate the fact that with a high enough *N* even signals with negative *SNR*<sub>p</sub> can be retrieved.

## 4.6 Regulatory considerations

The pulse shape used in the examples considered in this chapter is the second derivate Gaussian pulse. As explained earlier, this is the shape seen by



Figure 4.11: Simulated RMS error in estimated ToF for increasing SNR and N from 10 to 10 000.

the receiver when a monocycle is sent to the transmitting antenna. In practical realizations it is however common to use higher order derivates of the pulse because they provide a better fit to the regulatory frequency masks. The Gauss pulse and its fifteen first derivates is shown in figure 4.12.

The shape of the incoming pulse is critical in Active Echo. If higher derivates are used, this has to be considered in both devices. In the main device, the ToF algorithm may need to be adjusted to compensate for the different pulse shapes. This should however be doable with little extra effort. In the echo device, the impact of more complex pulse shapes is minimal for high *SNR* signals, since the echo device will trig on the rising edge of the first part of the pulse. For weaker signals however, the pulse shape leads to an increased delta error.

## 4.7 Sources of Error

Some error sources that are hard to work around, such as those bound to physical properties of the channel. Others are related to the properties of the specific implementation. In this section, some potential error sources will be identified.



Figure 4.12: The gaussian pulse and its first 15 derivates. The pulse shape considered in this thesis is highlighted.

## 4.7.1 Signal velocity

So far, the signal has been assumed to move with the speed of light in vacuum (c). In air the speed is actually slightly less than c, but the difference is small enough to be unnoticeable for most applications. Variables such as temperature, humidity and air pressure might also affect the speed to a certain extent. When the signal travels through a matter other than air, the speed could degrade considerably. In these cases the signal will usually also loose a lot of strength. The performance of Active Echo under such channel conditions will be further discussed in a separate section (4.8).

## 4.7.2 Delta error revisited

The delta error, as described in section 4.3, has the consequence that even if we can obtain perfect ToA estimation of the echo pulse in the main device, the delta error still limits the achievable precision. This is illustrated in figure 4.13. The offset can however be predicted according to the pulse shape and subtracted from the final result. If higher precision is desired, the delta error has to be estimated and added to the offset before it is subtracted. The simplest way of doing this is to simply assume that the pulse was detected at its peak value, which is usually the middle of the pulse. This does however mean that a larger error is added for shorter ranges than for longer ranges, where the signal is weaker and the threshold crossing in the echo node happen closer to the center of the pulse.



Figure 4.13: Shows how the delta error is seen in the main device. The solid and stapled pulses illustrate the variance in arrival time caused by the delta error.

## 4.8 Performance in non-ideal channel conditions

The results from the preceding sections apply to an ideal channel. Non ideal channel conditions might lead to poorer performance, and in this section two non idealities will be introduced and their impact on the proposed scheme will be estimated.

## 4.8.1 Multipath channels

In real life, channels are affected by multipath and objects might be blocking the direct path of the signal. The two special cases of multipath, NLOS and LOS, will be addressed in the following sections.

#### Line of Sight

In the case of LOS, the direct path from sender to receiver is the strongest. Depending on the environment, the amount of multipath will vary. The multipath will however always arrive later than the direct path. It can therefore be assumed that the echo device is unaffected by multipath in the case of LOS. In a dense multipath environment, this assumption might not be entirely correct for weak signals. Assume for instance that a pulse is missed by the quantizer. Depending on the strength of the following multipath component, there is a possibility that this is picked up by the quantizer instead. This does however require a strong multipath component. Either way, the multipath components will have lower probabilities of triggering the quantizer in the echo device than the direct path component, meaning the error will decay with increasing N in an integrating sampler.

#### Non Line of Sight

NLOS impose a great challenge for the ranging scheme. The direct path is no longer the strongest component, and in some cases there might not even be any direct path component at all. In the case where the direct path is a weaker version of the succeeding multipath components, the multipath channel introduce a non linearity to the ranging procedure. This is the case because the direct path might be strong enough to cross the threshold in the echo node for short distances, while it is the stronger multipath component that cross the threshold for longer distances. If the direct path is completely blocked, the ranging will return a distance with a positive error added to it.

There is however some light in the tunnel here. Although direct path is the most common quantity to base distance measurements upon, it might not always be that the user want to know the distance in a straight line between the devices, but rather the actual distance one would have to move.

#### 4.8.2 Multi-User interference

Multi User Interference (MUI) come from noise in the physical medium generated by neighboring devices. In a sensor network consisting of hundreds or even thousands of nodes, the amount of MUI is potentially devastating if not controlled. Active Echo is sensitive to MUI, and particularly the echo device need to be in a relatively quiet environment to keep  $P_{failure}$  at a minimum. Predicting or estimating MUI is challenging, and will only be briefly discussed here.

The sources of MUI in a WSN can be split into two main groups. Interference created by other nodes in the same network can be avoided by incorporating some sort of time slotting mechanism. In a decentralized and self-organized network it might also be possible for the main device to broadcast a beacon of some sort, ordering all nodes within range to stand back for a given time period while the main and echo device perform the ranging.

Interference created by devices occupying the same frequency spectrum, but without any relation to the network is however harder to work around. A common method of modeling MUI is by Standard Gaussian Approximation (SGA), where all interfering contributions are treated as Gaussian noise with uniform PSD over the frequencies of interest [Bene 04]. This means that the SGA noise is added to the receiver noise. The  $SNR_p$  of a MUI affected receiver is therefore:

$$SNR_p = \frac{V_{peak}^2}{\sigma_n^2 + \sigma_{mui}^2} \tag{4.13}$$

where  $\sigma_{mui}^2$  is the equivalent noise power of the MUI.

Modeling MUI this way means that the estimations done in this chapter based  $SNR_p$  are still valid, as long as the  $\sigma_{mui}$  is added to  $\sigma_n$ 

## 4.9 Summary

The simplicity of the echo device come at a cost, and to achieve an acceptable failure rate the threshold has to be set higher than the noise floor. For shorter ranges this is not an issue though, since the  $SNR_p$  is high enough to ensure threshold crossings. For a typical AWGN channel, this behavior seize to exist for ranges above approximately 4 m with a threshold at approximately  $2 \cdot \sigma_n$ , where the scheme has to rely on the phenomenon known as Stochastic Resonance (SR). It is however shown that with a high enough N even signals with negative  $SNR_p$  can be retrieved and accurate timing and thus distance estimation can be achieved.

Setting thresholds is an issue, and the echo node in particular is sensitive to accurate threshold setting. Since noise levels can be unpredictable, specially if MUI is considered, it is advisable to incorporate some sort of adaptive threshold scheme. An algorithm is proposed that exploit the already existing hardware in achieving a threshold setting without the need of a priori knowledge of the noise.

As with all RF based ranging schemes, NLOS is an issue, and for extreme cases of NLOS where the direct path is either completely blocked or significantly reduced, accurate distance estimation is in practice unachievable.

#### 4.9. SUMMARY

For milder cases of NLOS, where the direct path is present but weaker, the proposed Max selection algorithm could locate the strongest path and wrongly assume this to be the direct path. To avoid this, verification can be used to search samplers prior to the strongest path for earlier path arrivals. This particular problem is explored in greater detail in the work by Lee and Scholtz in [Lee 02].

When choosing a sampler strategy in the main device, the choice stands between the permanent and swept threshold integrating sampler. It is hard to draw a final conclusion here based on the relatively simple estimations and analysis performed in this chapter. It is however obvious that the swept threshold provide better signal reconstruction for short range (high  $SNR_p$ ) signals. To a first approximation, this extra information might not seem very useful. There are however situations where this information could prove useful. In addition to the mentioned adaptive thresholding, amplitude information is important when resolving multipath errors, particularly in the weaker cases of NLOS. 56

## **Chapter 5**

# **Circuit implementations**

by Nikolaj Andersen and Håkon K. Olafsen

Body biasing of MOS transistors is an increasingly popular technique, since it allows adjustment of the speed and/or the static power consumption of the devices after production. In this chapter we will explore this technique by measurements on a ring oscillator. In addition a Time Difference Measuring Circuit utilizing the adjustable gate delay in a body biased inverter is implemented and examined.

## 5.1 Chapter overview

This chapter is split in to main parts. First a ring oscillator consisting of inverters with tunable gate delay is presented, complete with both simulation and measurement results. The tunable gate delay is achieved through Body Bias (BB).

Secondly, a novel Time Difference Measuring Circuit (TDMC), utilizing BB to create small delay differences in parallel delay chains is presented. The primary goal of this implementation is to investigate if the scheme is viable. The achievable time resolution will also be discussed.

The implemented circuits and the content of this chapter is a cooperative work by Nikolaj Andersen and Håkon Olafsen. Both authors has contributed to the entire process, from chip layout to PCB design and measurements. The chapter is also written in cooperation, and is included in both theses.

## 5.2 Ring oscillator

The tapped delay line is a relatively common structure, used in for instance rake receivers [Limb 05] and radars [Hjor 06]. A delay line is a series of delay elements, usually realized with an even number of inverters. To make the delay through these delay elements tunable after production is a desirable feature in many applications, in addition to being useful as a compensation for process variations. Varying the delay through an inverter by altering for instance the supply voltage or limiting the current is feasible, but by doing this we add additional elements to the signal path and/or reduce the signal swing of the circuit. Instead of adding extra devices to the circuitry, the body of the transistor, also known as the back gate, can be biased to tune the delay through the device.

## 5.2.1 Introduction

A ring oscillator is a simple structure consisting of an odd number of inverters connected to each other as a ring. It is often used to characterize technologies. This is done by measuring the gate delay through the inverter used in the circuit. It is a well known and widely used circuit when measuring gate delays, and has been used since the early days of integrated circuits [Forb 73, Fang 75].

Delay through digital gates is normally described by the average gate delay  $\tau$ , which is the average of the rise time  $\tau_{LH}$  and the fall time  $\tau_{HL}$  of the output. The average gate delay of the inverters in a ring oscillator can be calculated from the oscillator's fundamental frequency by the following formula:

$$\tau = \frac{1}{2 \cdot N \cdot f} \tag{5.1}$$

where  $\tau$  is the average gate delay, *N* is the number of inverter stages and *f* is the fundamental frequency of the oscillator. Figure 5.1 show a ring oscillator with 5 stages.

## 5.2.2 Motivation

The chip originally included two different TDMCs, one with analog output from an analog integrator (a capacitor), and one with purely digital outputs. Because of the high time resolution of the circuits, off chip generation of input signals with small enough time difference is a problem as the resolution is in the order of 10-20 ps. To provide the small time difference needed, inverters with programmable gate delay was included as



Figure 5.1: Typical ring oscillator with 5 stages

initial delays to the analog integration circuit. The purely digital circuit use another type of initial delay, which is described in detail in a later section. To characterize the initial delay inverters, a ring oscillator was included.

A figure illustrating the TDMC with analog integrators can be found in figure 5.2. The main idea is that when a pulse propagate trough each of the delay lines, they will eventually appear at the same delay element at the same time and create a collision. The duration of the collision is stored as a charge in the capacitor. Collisions appearing frequently at the same spot are rewarded, while collisions appearing in random spot are given less weight.



Figure 5.2: Schematics of TDMC with capacitors as analog integrators

To read out the voltages on the different capacitors, an analog multiplexer (figure 5.3) was implemented together with the circuit. Measurements proved that the selector circuit in the multiplexer did not work as expected, resulting in corrupted voltages at the output of the circuit. Because of limited

time, and because the second TDMC circuit has a higher potential resolution, no more efforts was put into getting more results from the circuit.



Figure 5.3: Top level of the analog multiplexer

The ring oscillator on the other hand gave promising results. Several aspects of the programmable ring oscillator are of great general interest, and in particular it is of major interest to the second TDMC presented in the last part of this chapter. The following part is devoted to the simulation and measurement results gained from this circuit.

## 5.2.3 Overview of the circuit

The implemented ring oscillator consists of 201 inverters. It is tapped through a buffer consisting of two inverters. This reduces the extra load on the ring oscillator to one extra gate and only adds a small error in gate delay estimation.

Programmability of the gate delay is achieved through BB of both the NMOS and PMOS transistors. When a positive voltage is applied to the body of a transistor, this is referred to as Forward Body Bias (FBB). When a FBB is applied to the transistor, the gate delay can be decreased due to reduced threshold voltage [Oowa 98, Nare 03]. Similarly, applying a negative voltage to the body results in what is called Reverse Body Bias (RBB). When a RBB is applied, the threshold voltage increase, and thus also the gate delay is increased [Seta 95, Kesh 01]. To separate the body of the transistor from the substrate, the chip is realized in a triple well process. This allows a deep nwell to be used together with an nwell guard ring as isolation. In figure 5.4 a cross section of an NMOS transistor in a deep nwell is shown.

The body bias terminals for the NMOS and PMOS transistors are connected to separate pads on the chip. A transistor with body terminals is sometimes referred to as a four terminal device. The fourth terminal, or the body, of the transistor can be referred to as the back gate of the transistor, because



Figure 5.4: Sketch showing the cross section of an NMOS transistor placed in a deep nwell

it behaves similar to the front gate. A more detailed description of body biasing will be given in the following sections.

## **Body biasing**

When applying bias to the transistor bodies, the bias can either be applied to both the NMOS and the PMOS transistors at the same time, or it can be applied to only one of the two. This depends on the intended application of the circuit, and the purpose of applying body biasing. In this circuit, the bias will be applied to both transistors at the same time in a differential manner, meaning that the PMOS bias voltage  $V_{PMOS}$  equals  $V_{DD} - V_{NMOS}$ .

Body biasing is often used to increase the speed or to decrease the off current in digital circuits. Some implementations do both, by applying RBB when the circuit is in standby to gain the best from both. This dynamic threshold voltage concept was introduced in [Seta 95].

It also offers advantages in analog applications, such as reducing output resistance in amplifiers, and it can be used in Voltage-Controlled Oscillator (VCO)s [Wann 00]. The use of the body as an active component is sometimes referred to as active well biasing.

Body biasing can also be used to compensate for process variations, see

[Mele 04, Brya 01, Gran 06].

As already mentioned, the reduced gate delay is a result of reduced threshold voltage. The threshold is roughly proportional to the square root of the body bias through the following formula

$$V_{th} = V_{t0} + \gamma \left[ \sqrt{(V_{bb} + 2\phi)} - \sqrt{(2\phi)} \right]$$
(5.2)

where  $\gamma$  is a technology dependent constant often referred to as the body effect constant,  $\phi$  is the Fermi potential and  $V_{bb}$  is the body bias voltage. Using the Shockley model [Shoc 52] as a first order approximation, the strong inversion source drain current of a transistor in the saturation region is given by

$$I_{DS} = \frac{\beta}{2} (V_{GS} - V_{th})^2$$
(5.3)

$$\beta = \frac{\mu \epsilon}{t_{ox}} \frac{W}{L}$$
(5.4)

It has a quadratic relationship toward the effective gate voltage  $V_{GS} - V_{th}$ . The Shockley model does not model the behavior in weak inversion, but the so-called EKV model proposed by Enz et al. [Buch 98], models the current in weak inversion as an exponential function of the effective threshold voltage.

When a rising voltage beginning at 0 V is applied to an inverter, the NMOS transistor start off in cut off while the PMOS is in strong inversion. As the input voltage rise, the NMOS moves through weak inversion before it enter strong inversion, and for a while both transistors are in strong inversion. After a while the threshold of the PMOS is crossed and only the NMOS remain in strong inversion, while the PMOS enter weak inversion for a while before it reach cut off. As device lengths and supply voltages decrease, the strong inversion part of the total operating area decreases. When the threshold voltage is decreased, the strong inversion part is actually increased.

As the threshold voltage is decreased, the off-state leakage current increase. This current can negate some of the delay improvement from increased onstate current in the transistors in the inverter. It will also lead to higher static power dissipation.

Due to short channel effects in modern CMOS transistors such as velocity saturation and Drain Induced Barrier Lowering (DIBL) the first order

62
Shockley model is inaccurate. In addition, when a positive bias voltage is applied to the bulk, the effective channel length is increased due to decreased depletion regions around drain and source. This can actually suppress second order effect caused by short channels, and affect the threshold voltage considerably. More accurate models for threshold voltage, taking into account DIBL have been proposed [Liu 93].

Advanced Computer Aided Design (CAD) tools such as cadence use more advanced simulation models such as the BSIM4 model [Hu 03]. They model the body effect more accurately, taking short channel effects into account. In the BSIM4 model, as in most popular transistor models, the effect of body bias is modeled as a change in threshold voltage. In reality, the back gate acts much like the front gate, and inflict on the operation of the transistor in other ways. This leads to a suspicion that the behavior of a body biased transistor may not be estimated accurately by current computer simulation tools, at least not for the purpose we intend here.

The operating area of the back gate is limited compared to the front gate because of the body-drain and body-source junction diodes. The operating voltages of integrated circuits have declined rapidly the recent years, with modern circuits operating at 1 V and below. The threshold of the junction diode is however a constant determined by the material the chip is realized in (approximately 0.65 V for silicon). This means that the relative operation area of the back gate is higher for lower supply voltages.

As the bias voltage is increased, the power consumption is expected to increase [Liu 00]. This increase can be split into two regions. For low  $V_{bb}$ , the current consumption comes from off-state leakage and increased parasitic capacitances due to shallower depletion regions. Once  $V_{bb}$  reaches the threshold voltage of the bulk-drain and bulk-source diodes, which is approximately 0.6 volts, the power consumption will increase rapidly due to the forward biased diode current. This can also negate the delay improvement as the bulk-drain current increase and reduce the output swing of the inverter. For these reasons, the bias voltage  $V_{bb}$  should be kept below 0.6 volts [Wann 00]. In [Nare 03] an optimum bias voltage of 500 mV is found using measurements in a  $0.18 \mu m$  CMOS process. Further increase bias voltage only results in decreased performance and rapid increase in power consumption.

## 5.2.4 Schematic

The inverter used in the ring oscillator is based on a standard inverter from the STM 90nm library, with a fan-out of 1. At minimum length the gate delay is simulated to be under 14 ps with a single inverter as load. Increas-

| Gate delay |        |       |       |       |       |                |
|------------|--------|-------|-------|-------|-------|----------------|
| Length     | 0 mV   | 300mV | 500mV | 600mV | 700mV | $\sigma @ 0mV$ |
| 0.1        | 13.68  | 12.77 | 12.58 | 12.7  | 13.04 | 0.33886        |
| 0.2        | 32.55  | 30.09 | 29.25 | 29.23 | 29.6  | 0.509          |
| 0.3        | 61.2   | 56.77 | 55.02 | 54.75 | 54.96 | 0.824          |
| 0.4        | 100.43 | 93.3  | 90.36 | 89.75 | 89.63 | 1.18618        |
| 0.5        | 151.5  | 141.1 | 136.5 | 135.3 | 134.7 | 1.61685        |

Table 5.1: Simulated gate delay for different FBB voltages and transistor lengths. All delays are in ps

|      | length | width |
|------|--------|-------|
| NMOS | 0.400  | 0.600 |
| PMOS | 0.400  | 0.800 |

Table 5.2: Transistor sizes used on the chip

ing the length of the transistors drastically increases the delay, a times four increase in length leads to an increase of almost eight in gate delay.

In table 5.1 simulation results for different transistor lengths and FBB voltages are shown. For all transistor lengths, the NMOS width is 0.6*um* and the PMOS width is 0.8*um*. As mentioned before, the inverters in the ringoscillator was originally designed to create inputs for a TDOA measuring circuit. Since the relative delay difference achieved in a long device is higher than for shorter devices, the implemented transistors were chosen longer than minimum. The same total delay difference could have been achieved using a higher number of shorter transistors.

In figure 5.5 delay through the implemented inverter as a function of body bias voltage is plotted, normalized to zero bias. From the plot, there are several interesting observations to be made. As expected, the delay decreases with increased bulk voltage. The decrease is not linear, it seems more like a quadratic relationship with bulk voltage. At about 0.7 volts, the decrease stops. This is also expected, because of increasing bulk drain current and leakage through the closed transistor.

There is however a major discrepancy between the results from [Nare 03] and the simulations provided here. Instead of an increase in delay after the minimum point, the simulated delay actually decrease. The most probable explanation of this is that the transistor model used in simulations does not model the behavior of the back gate correctly. Operation of the back gate of transistors in the area above the threshold voltage of the body-drain diode should in any case be avoided since it leads to excessive power dissipation.



Figure 5.5: Simulated delay versus body bias, normalized to peak delay at zero bias

## 5.2.5 Layout

The layout of the circuit is fairly straight forward. As already mentioned, the NMOS transistors are placed in wells. These wells consist of a deep n-well, an n-doped layer beneath the p-doped substrate, forming the bottom of the well. The walls of the well are made of n-doped substrate, overlapping the deep n-well in the bottom to seal the well. See figure 5.6 for an illustration of the cross section of the inverter. The layout of the inverters and the complete ring oscillator circuit can be found in appendix A.

# 5.3 Ring oscillator measurements

## 5.3.1 Measurement setup

The measurement setup consist of a PCB, an oscilloscope and voltage sources for biasing and power. The oscilloscope is connected to the PCB via a coaxial cable, and the power supply through BNC connectors soldered to the PCB. During early measurements, we observed higher harmonic frequencies at odd levels 3-7 times higher than the expected oscillation frequency. As it turns out, this is a quite common problem, see for instance [Sasa 82] where harmonic frequencies is observed in a 101-stage ring oscillator. The solution is to slowly ramp the supply voltage in small steps.



Figure 5.6: Sketch showing the cross section of the implemented inverter

Results are read out from the oscilloscope using GPIB with MATLAB.

Pictures of the PCB with the mounted chip is provided in appendix B.

# 5.3.2 Introduction

The measured waveform is shown in figure 5.7. The measurements were repeated on 20 different chips. The resulting measured frequencies are plotted in figure 5.8 for all chips. Two of the chips showed a major deviation in frequency from the rest. Since the chips all come from the same wafer, the most probable explanation to this is mismatching within the wafer.

In figure 5.9, the measured average gate delay is compared to the simulated results. The two deviating chips have been removed from the collection, leaving a remaining 18 chips to average over. The two most obvious differences from simulations is the rather big difference in simulated and measured frequency, and the way the frequency actually start to decrease after reaching a maximum. This is in contrast to the simulations, where the slope flattened, before it actually decreased further.

### 5.3.3 Gate delay measurements

## Measured gate-delay

The simulations indicated a fundamental frequency of about 25 MHz at zero biasing, while the measurements show that the real frequency is ac-



Figure 5.7: Measured waveform at the output of the ring oscillator



Figure 5.8: Measured frequency on the y-axis and BB voltage on the NMOS transistor along the x-axis



Figure 5.9: Average gate delay for measured and simulated behavior versus BB on the NMOS transistor

tually below 15MHz. This is a 40 percent drop in frequency. The simulations were done without any extracted parasitics, so that could explain at least some of this difference. To investigate the large difference in simulated and measured gate delay, the impact of load capacitance is investigated below. The estimations are based on the first order Shockley model [Shoc 52]. These are widely used in Metal-Oxide-Semiconductor Field-Effect Transistor (MOSFET) analysis, and are used in many textbooks [West 94, John 97]. From [West 94] we get that the transition time of a CMOS inverter can be approximated as:

$$t_{LH}, t_{HL} = k \cdot \frac{C_L}{\beta \cdot V_{DD}}$$
(5.5)

(5.6)

where  $\beta$  is the gain coefficient of the transistors given by:

$$\beta = \frac{\mu\epsilon}{t_{ax}} \frac{W}{L} \tag{5.7}$$

with  $\epsilon$  being the permittivity of  $S_iO_2$ ,  $t_{OX}$  the thickness of the gate insulation, and  $\mu$  the electron or hole mobility in the channel of NMOS and PMOS

devices. *k* is a constant given by:

$$k = \frac{2}{1-n} \left[ \frac{n-0.1}{1-n} + \frac{1}{2} ln(19-20n) \right]$$
(5.9)

with  $n = V_{th} / V_{DD}$ 

The period of the oscillations in the ringoscillator is the sum of rise and fall time. Adding the two result in the following equation for oscillator fundamental period:

$$t_{period} = t_{HL} + t_{LH} = \frac{C_L}{V_{DD}} \left[ \frac{k_n}{\beta_n} + \frac{k_p}{\beta_p} \right] \cdot N$$
(5.10)

Where *N* is the number of inverters. For our purposes the gate delay is more interesting. The gate delay is given by the average of the rise and fall time of the inverter:

$$t_{gate} = \frac{t_{period}}{2N} \tag{5.11}$$

Rearranging equation 5.11 gives the following equation for load capacitance as a function of gate delay:

$$C_L = \frac{t_{period} V_{DD}}{2} \left[ \frac{k_n}{\beta_n} + \frac{k_p}{\beta_p} \right]^{-1}$$
(5.12)

Using this equation results in a mean load capacitance of approximately 21fF.

The layout of the ringoscillator is shown in figure A.2. The circuit was laid out as one long structure, with one inverter following the next with minimal wiring between the stages. However, the feedback wire is rather long, stretching the entire length of the oscillator, and is the largest single contributor to parasitic capacitance in the complete circuit. Extracting parasitics from cadence shows that connections between the inverters add about 2.2fF of parasitic load, while the long feedback wire add 27.5fF. Because the relationship between delay and load capacitance is linear, the large capacitance from the feedback wire can be split into smaller capacitors and distributed to all the inverters in the circuit. This makes the extracted values easier to compare with the measured mean value. Doing this adds only 0.14fF to each of the inverters, meaning the parasitic capacitance of the feedback wire adds little to the total delay.

| Parameters | NMOS            | PMOS            |
|------------|-----------------|-----------------|
| $V_{th}$   | 0.24            | 0.29            |
| $V_{DD}$   | 1               | 1               |
| $\epsilon$ | $3.9\epsilon_0$ | $3.9\epsilon_0$ |
| $t_{ox}$   | 16A             | 16A             |
| μ          | 500             | 180             |

Table 5.3: Transistor parameter values

Another major contributor to additional delay is resistance in poly and metal wiring. It is not included in the estimation of load capacitance conducted above. Extraction of parasitic resistances in cadence reveal significant amounts of parasitic resistance, particularly in the contacts between the metal wiring and the gate poly and active drain area of the preceding transistors. This is also confirmed by a closer reading of the process parameters provided by STM. According to these parameters a typical metal to polycide contact has a resistance of 17  $\Omega$ 

Noise in supply voltage due to noise generated by the oscillator when the inverters switch is also a potential error source. Because the on chip wiring and bonding wires also acts as parasitic inductors, sudden changes in current such as those generated by a switching inverter will induce a certain voltage drop, which in turn could affect the speed of the oscillation. In addition the voltage drop over the resistive elements in the on chip wiring might contribute. Adding better decoupling, both on chip and off chip, and putting more work in the supply rails wiring will reduce this problem. This effect is also known as IR-Drop [Khal 06].

## Maximum oscillator frequency

As mentioned earlier, the frequency of the ring oscillator only increase to a certain point. In [Nare 03] this is pointed out, and an optimal, or maximum, bias voltage of 500 mV is found. Simulations with minimum length transistors in 90nm give a similar result. Simulations also indicate that longer transistors allow the FBB to be adjusted higher than for minimum length transistors. In both simulations and in [Nare 03], a maximum delay improvement of approximately 10% is found. Measurements however, show that the FBB can be even more effective, with a 25% increase in frequency at max.

In figure 5.9 the measured mean delay is plotted along with the simulated delay. When the two curves are compared (see figure 5.11 for a normalized plot) we see that the measured circuit display some interesting characteristics. The body can be tuned to over 750 mV(PMOS = 250 mV and the delay

#### 5.3. RING OSCILLATOR MEASUREMENTS

still decrease. Since this is above the threshold of the bulk-source and bulkdrain diodes the power consumption is expected to increase rapidly. In figure 5.10 a plot of the measured current consumption of the chip is plotted with the NMOS bias voltage. The absolute current is not interesting here, since the  $V_{DD}$  and GND connections are shared by multiple circuits on the chip. It does however confirm the hypothesis that the circuits draw considerably more current when the bias reaches the diode threshold voltage.



Figure 5.10: Current drawn by chip for different bias voltages. Notice the steep increase in current as the voltage pass the diode voltage of approximately 0.65 V

#### Process variations and mismatch

An interesting parameter that can be derived from the measurements is the amount of mismatch between the individual chips. A total of 20 chips were available for measurement, and as stated earlier two of them showed a significant deviation from the rest. They have been excluded from the samples, and the standard deviation in delay in the remaining 18 chips is found. In figure 5.12 the standard deviation is shown in seconds and percent of total delay as a function of body bias voltage. In figure 5.13 a linear fit to the standard deviation is shown, together with the residue of the fit. This is to show that the deviation seem to decay linearly with body voltage. Because of the low amount of current passing through the transistors



Figure 5.11: Comparison of simulation and measurement data, both curves normalized to their respective peak delay

in weak inversion compared to strong inversion, mismatching and process variations are more pronounced in weak inversion. As explained earlier in this chapter, a body biased transistor has a larger strong inversion area, and this could explain the decay in standard deviation.

The standard deviation in seconds at zero bias is approximately 1.3 ps, which is not far from the simulated value at 1.2 ps. In percent of the total delay however, the deviation is far lower in the measurements (0.76%) than in the simulations (1.18%).

From a statistical point of view, 20 samples is a low count to base a conclusion upon. The simulated standard deviation is found through monte carlo simulations of mismatch with 1000 iterations. It does however give us an indication on the mismatch between the units. We see that the simulated deviations provide a rather good estimate of the actual deviations seen in measurements.

## 5.3.4 The oscillator as a VCO

When examining the first part of the measured curve in figures 5.11 and 5.14, it seems to display an almost linear behavior. In VCO's, this is often a desired property since it makes the VCO behavior predictable. In this



Figure 5.12: Standard deviation from measured delay in seconds, based on measurements on 18 different dies.



Figure 5.13: Linear fit to standard deviation in measurements

section the linearity of the oscillator is explored.

To investigate the linearity of the oscillator, a first degree polynomial fit was performed in MATLAB. Because of the nonlinear behavior displayed at high biases, the polynomial fit is performed only on the first part of the curve. In figure 5.14 a plot of the linear fit is shown together with the residue from the fit, showing the error of the fit. It is clear that the curve is not entirely linear. The shape of the error curve indicates that a higher degree polynomial might fit the curve better. A second degree polynomial fit is shown in figure 5.15 together with its corresponding error curve. The seemingly random distribution of error indicates that the second degree polynomial is the best fit, at least for lower bias voltages. The quadratic behavior shown by the VCO is however weak, since the error in the linear fit is 1.5% at most. This weak quadratic tendency can be related to the quadratic behavior of the drain source current in the saturation region. In fine pitch processes this quadratic behavior is significantly reduced due to velocity saturation, explaining the close to linear behavior.

From around 600 mV the curve seems to take another shape, but this is as expected since it is close to the diode threshold. One would typically aim at keeping the bias voltage below this threshold in an implementation to avoid excessive power consumption.

Another important property of VCOs is phase noise. Because of time constraints, the degree of phase noise in the implemented oscillator has not been measured. Ring oscillators do however often tend to inhibit a certain amount of phase noise. Further study of the VCO properties of the body biased ring oscillator is therefore required before a conclusive description of the performance of the oscillator as a VCO can be made.

## 5.3.5 Summary

The measurements have provided several interesting observations. First of all, the absolute gate delay of the inverters deviates considerably from the schematics simulations. Extracting parasites from layout improve the simulation considerably, but there are still significant discrepancies between measured and simulated delay. Some of this comes from other sources than just parasitic capacitors and resistors, with IRdrop as a potential candidate. Since no special care was taken during layout regarding routing of the supply rails, it is possible that it contributes to the decreased speed of the oscillator.

As mentioned in the beginning of this section, we suspected that there might be noticeable differences between the simulated and measured behavior of the back gate. The measurements confirmed these suspicions.



Figure 5.14: Linear fit to measured frequency



Figure 5.15: Second degree polynomial fit to measured frequency

With the effect shown in these measurements, the back gate can probably be used in several applications. It can provide weak quadratic behavior in a VCO and it has great potential as a tunable delay element.

# 5.4 Time Difference Measuring Circuit

There are two fundamental techniques when measuring time. The first is to measure absolute time compared to a clock and the other is to use relative time measurements. Absolute time measurements rely on synchronized clocks, which is hard to achieve when the devices are spread over some distance. GPS use absolute clocks as references and is dependent on atomic clocks with high accuracy to stay synchronised. Implementing high precision clocks in WSN nodes is not a feasible option, making them rely on imprecise in-field synchronisation.

Relative time measurements are done by measuring the time difference between two or more signals. By using relative time measurement techniques, all the time critical parts can be placed in the same unit. This makes it easier to control the environment for the time measurement, and hence increase the accuracy.

The main idea with this Time Difference Measuring Circuit (TDMC) is to use a small relative delay difference in two delay elements to measure a relative time difference between two pulses. By using this relative delay difference we hope to achieve a time resolution smaller than the gate delay. The delay elements are ordered in two parallel tapped delay lines of *N* elements, one with *normal* delays and a second with slightly *faster* delays. The difference in delay is called  $\Delta \tau$ . Detecting a time difference is done by sending the first pulse through the slow tapped line and the second pulse through the fast tapped line. After *n* delays the second pulse will overtake the first. By determining the number of delays (*n*) before this happens and knowing the  $\Delta \tau$  we can estimate the time difference ( $\tau_i$ ) between the two pulses. In figure 5.16 the two delay chains are shown.

To determine where the second pulse overtakes the first we need some sort of detector. The output of the detectors is a function of the three inputs  $\tau_i$ ,  $\Delta \tau$  and *n*. The output of the detectors is represented by:

$$for n \in [0, N]$$

$$f(n, \Delta \tau, \tau_i) = \begin{cases} 1 & , \tau_i \ge \Delta \tau \cdot n \\ 0 & , \tau_i < \Delta \tau \cdot n \end{cases}$$



Figure 5.16: Relative delay difference in parallel chains.

This shows that the output of the first *n* detectors is f = 1, while the remaining N - n is f = 0. In other words, the output is stored as a thermometer code.  $n_e$  is the thermometer output and relates to the inputs as follows:

$$\tau_i = \Delta \tau \cdot n_e \tag{5.13}$$

By solving this, the time difference between two incoming signals can be estimated

## 5.4.1 Motivation

In earlier literature scaling has been used to create the relative delay difference of two delay elements. [Abas 04] By doing this it is not possible to tune the circuit after production. By using Body Bias (BB) it is possible to tune and change the relative delay difference between two identically scaled devices. The possibility of tuning the delay in a delay element after production was the main motivation for creating this circuit. The goal is to investigate the achievable relative precision using this technique.

Measuring a time difference between two pulses can be used in several applications such as VCOs, to determine AoA or for ranging purposes. This chapter will treat ranging in particular.

#### Achievable ranging accuracy

Ranging using Radio Frequency (RF) signals is based on the propagation of electromagnetic radiation. Electromagnetic radiation propagate with a speed close to 0.3 mm/ps in vacuum, which in terms of ranging can be considered a worst case scenario. In the TDMC presented here, the accuracy is given by the difference in delay between two delay elements. If one element is tuned 5 ps faster than another, this yields a spatial resolution of:

$$c = 0.3 \text{ mm/ps}$$
  
 $\Delta \tau = c \cdot 5 \text{ ps} = 1.5 \text{ mm}$ 

The achievable ranging accuracy can potentially be in the order of a few millimeters, which is a significant improvement over existing ranging systems.

## 5.4.2 Overview of the system

The TDMC use two parallel tapped delay lines with BB tuning as the main feature. The detector consist of one d-flip-flop with the D-input tapped from the slow delay line and the Clk-input from the faster delay line. This implements the thermometer code where the first n d-flip-flop outputs are high, see figure 5.17.



Figure 5.17: Show the implemented TDMC with detectors.

The delay chains and the d-flip-flops are the key elements in the TDMC. When the measurement is completed, the next step is to decode the thermometer code. On this chip there is no need for the information on-chip

78

so it is shifted into a shift register which then can be shifted out on a single pad. By sending the output from the last d-flip-flop back into the first, the thermometer code can be read with an oscilloscope. The signal will have a period equal to the number of d-flip-flops times the input clock period. The duty cycle of the signal represent the thermometer code. If the duty cycle is 50% then half the detectors have a high output.

A problem when characterising the TDMC is to create the two input pulses. Creating two pulses with a time difference in the order of picoseconds is very difficult off-chip. The solution is to create two pulses on-chip using a pulse generator circuit. The input is a single pulse, which is then split into two different pulses with a small time difference.

The complete structure for the TDMC with input and output systems is shown in figure 5.18.



Figure 5.18: Schematic for complete circuit.

## 5.4.3 Implementation

The implementation of the TDMC is focused on proving the concept using tapped delay lines and d-flip-flops as detectors. Throughout our simulations and measurements the slow tapped delay line will not be biased while the faster will be. When BB is used later it will always refer to the biasing of the faster tapped line.

The dynamics of the TDMC is the combination of  $\Delta \tau$  and the number of elements in the line and the measurable time range achieved. By using two

tapped delay lines of length N = 32 the dynamics are good. In figure 5.18 the S<0:2> input to the pulse generator is to a decoder that selects one of eight different time differences. The time differences created by the pulse generator have a range of about 50 ps, see table 5.5. With this input range of  $\tau_i$  the dynamics of the TDMC can be thoroughly tested. The implemented system consists of the pulse generator, two tapped delay lines with 32 elements, 32 detectors and a 32 bit shift register. The different parts will be discussed shortly.

Figure 5.19 show the timing diagram for the entire circuit. This generic timing diagram show how the inputs and output is related for a given S<0:2> and BB combination.

**CK** is the clock input to the shift register.

**P**<sub>in</sub> is the pulse input to the pulse generator.

**ES** is the enable shift signal to the shift register.

**Out** is the output from the shift register.



Figure 5.19: Timing diagram for the implemented TDMC test circuit.

#### **Pulse generator**

Creating a very small time difference between two pulses is not that easy in CMOS. One solution would be to use two tapped delay lines and then select the pulses after *M* elements and send this into the TDMC. This would result in delay elements both creating and measuring the time difference. *M* elements in the pulse generator would result in *N* elements in the TDMC. For this to be useful the two chains would have to be scaled differently or biased differently. Any error in the TDMC would be correlated and suppressed by an error in the pulse generator. Without any earlier experience with Body Biasing this seemed like a bad idea, instead a solution using RC delays is implemented. Combining buffers and varying RC delay to create the different time delays make sure the generator and TDMC are independent. This also means the pulse generator does not need any biasing reducing the number of external pads.

In the final pulse generator a long unsilicided N+ poly resistor is used in the RC delay. The capacitor is the parasitic capacitance from the resistor to the bulk. The different time delays are tapped after different lengths of this poly resistor, creating some non ideal effects. These effects come from nonlinear changes in the resistor caused by temperature changes and process mismatch and the output from RC delays.

### **Delay element**

The delay element is of utmost importance to the performance of the TDMC. A good delay element for the TDMC will have a small absolute standard deviation caused by mismatch. Preliminary simulations showed that smaller delay elements would yield smaller standard deviations.

In table 5.1 we see that an inverter can have a standard deviation of 0.34 ps. The standard deviation would transform directly into the ranging precision. From this we can conclude that shorter inverters are good for the precision. Since this circuit merely is a proof of concept, longer transistors are used. The minimum length of transistors would require very small time differences from the pulse generator. By lengthening the transistors and increasing the effect of the body bias the constraints on the pulse generator is reduced.

In table 5.4 simulations for different delay elements are listed. Each delay element consists of two inverters scaled as input and output stage, both listed in the table. The different delay elements have different sizes and number of fingers.

All the simulations have been performed with 0 V input to the back gate using BSIM3 model.

With all this in mind the delay element *delay04wbg* was chosen to be used in our implementation. The schematic for the delay element is shown in figure 5.20.

|               | Sizes (L · W) in μm |               |                |                |             |         |
|---------------|---------------------|---------------|----------------|----------------|-------------|---------|
| Name          | Input stage         |               | Output stage   |                | Monte Carlo |         |
|               | nMOS                | pMOS          | nMOS           | pMOS           | μ           | σ       |
| delay01       | $0.5 \cdot 4$       | $0.5 \cdot 6$ | $0.5 \cdot 10$ | $0.5 \cdot 14$ | 357.16 ps   | 1.9 ps  |
| delay01wbg    | $0.5 \cdot 4$       | $0.5 \cdot 6$ | $0.5 \cdot 10$ | $0.5 \cdot 14$ | 345.13 ps   | 1.9 ps  |
| delay04wbg    | 0.3 · 2.4           | 0.3 · 3.6     | $0.3 \cdot 6$  | 0.3 · 9        | 145.4 ps    | 1.26 ps |
| delay04       | 0.3 · 2.4           | 0.3 · 3.6     | $0.3 \cdot 6$  | 0.3 · 9        | 156.58 ps   | 1.56 ps |
| delay04wbghvt | 0.3 · 2.4           | 0.3 · 3.6     | 0.3 · 6        | 0.3 · 9        | 229.96 ps   | 2.41 ps |
| delay04hvt    | 0.3 · 2.4           | 0.3 · 3.6     | 0.3 · 6        | 0.3 · 9        | 239.53 ps   | 2.5 ps  |

Table 5.4: Delay element sizes & Monte Carlo using BSIM3.



Figure 5.20: Schematic of the delay element with sizes.

#### **D**-flip-flop – the detector

The detector, a d-flip-flop in this implementation, is also a critical component. A edge triggered flip-flop is the architecture used in our implementation. It has to be edge triggered to work, since the goal is to detect the edge of the pulse. The timing is the critical property since the main goal is to accurately determine the number of delay elements before the pulses pass each other. The required setup time for the data signal should be as small as possible and controllable. If the two pulses arrive at a d-flip-flop at exactly the same time the d-flip-flop should ideally go high. Because of the setup time, this will not happen in the implemented system. The setup time for the signal will add a small negative time error to the estimate and should be corrected when estimating the time difference.

Any mismatch on the same die should also be avoided. The flip-flop architecture should be robust enough to handle mismatch, and this should be considered during the layout. This was taken lightly in this implementation and is a source of errors in measurements. Due to time limitations an existing d-flip-flop implemented by Hans Berge was used.

## 5.4.4 Expected results

With the main focus on body biasing combined with a proof of concept, and time limitation during production, extensive simulations were not possible. Some preliminary simulations were done on the different components and a few simulations on the entire circuit to test the maximum and minimum inputs. Based on these simulations the circuit is expected to work with all inputs as long as BB/ $\Delta\tau$  is large enough.

After the production more thoroughly simulations have been completed and some unexpected results were found. First the pulse generator was not tested enough and have some weaknesses found with Monte Carlo simulations. Secondly the d-flip-flop was not simulated as a detector, but it was characterised to handle a clock frequency of approximately 2.5 GHz by Hans Berge. In this section the different components are studied closer. At the end the expected achievable accuracy is discussed.

#### **Pulse generator**

Simulation results from the pulse generator schematics are seen in figure 5.21. The time difference between the pulses can be varied from approximately 125 ps to 175 ps, a variation of 50 ps. The Monte Carlo results are found in table 5.5, where the first weakness is seen. Looking at the figure

and the table we see that the generated delays are not linear. That means special consideration most be taken when evaluating the simulations and measurements. By using these simulation results as the estimated points along the x-axis the results is adjusted regarding the x-axis and more readable. This is not one hundred percent accurate, but it is a better approach than using linearly spaced points along the x-axis.

This phenomena is caused by the use of RC delay creating a ramp function. By looking at the different taps this ramp function can be compared to different time constants. The time constants will increase further down the line but the increase is not linear. The changes are smaller for larger  $\tau_i$ . This will increase the impact of noise on the signal and should result in an increased standard deviation in measurements for larger  $\tau_i$ .

The second problem with this implementation is the standard deviation found running Monte Carlo simulations. In table 5.5 a standard deviation of approximately 6 ps is observed for all the created time differences. These Monte Carlo simulations are done with 250 iterations and mismatch only. Simulations with process variations and mismatch showed a standard deviation  $\sigma = 10-12$  ps, which is large compared to the accuracy we hope to measure.



Figure 5.21: Simulation results for the pulse generator.

#### **Delay element**

Simulation results for the used delay element can be seen in table 5.6 and figure 5.22. The table show the effect of the back gate on the average delay and the standard deviation. The results deviate from the first simulations

| S<0:2> | $	au_i$  | σ       | $\Delta 	au_i$ |  |
|--------|----------|---------|----------------|--|
| 000    | 126.8 ps | 6.27 ps | -              |  |
| 001    | 140.5 ps | 6.01 ps | 13.7 ps        |  |
| 010    | 151.5 ps | 5.85 ps | 11 ps          |  |
| 011    | 160.4 ps | 5.91 ps | 8.9 ps         |  |
| 100    | 166.3 ps | 6.06 ps | 5.9 ps         |  |
| 101    | 171.2 ps | 6.00 ps | 4.9 ps         |  |
| 110    | 173.3 ps | 5.76 ps | 2.1 ps         |  |
| 111    | 174.8 ps | 5.78 ps | 1.5 ps         |  |

Table 5.5: Monte Carlo simulations for the pulse generator with 250 iterations and mismatch only, using BSIM4.

in table 5.4 since a new improved model (BSIM4) is used. The BSIM4 simulations differ from the BSIM3 simulations with lower  $\Delta \tau$  but significantly higher  $\sigma$ . By sweeping BB from 0 V to 0.5 V the mean delay is reduced by 8.74% and the standard deviation by 18.65%. This shows that the mismatch will decrease when the body bias is increased, as expected. This is also shown in earlier works by Narendra et al.[Nare 03]

The Monte Carlo simulations used mismatch and process variations and 1000 runs on the BSIM4 model.

| BB     | μ          | σ          | $\Delta 	au$ |
|--------|------------|------------|--------------|
| 0.0 mV | 115.882 ps | 5.00024 ps | 0 ps         |
| 0.1 mV | 112.918 ps | 4.73094 ps | 2.964 ps     |
| 0.2 mV | 110.46 ps  | 4.51298 ps | 5.422 ps     |
| 0.3 mV | 108.433 ps | 4.33084 ps | 7.449 ps     |
| 0.4 mV | 106.882 ps | 4.1847 ps  | 9.000 ps     |
| 0.5 mV | 105.748 ps | 4.06785 ps | 10.134 ps    |

Table 5.6: Monte Carlo simulations for the delay element with 1000 iterations, mismatch and process variations, using BSIM4.

As we can see in figure 5.22 the change in delay is close to linear with back gate input. We also see that the simulations show that we could achieve an accuracy of only a few picoseconds by biasing the delay chains with about 0.1 V difference.

## **Grey-maps**

When processing the thermometer code the number  $n_e$  is the key information. When measuring the completed circuit, noise is inevitable. The noise is expected to be white Gaussian noise. If the same measurement



Figure 5.22: Simulation results for the delay element used in the TDMC.

is done several times, integrating over time, the result should be a PDF. The left plot in figure 5.23 is the results from the entire circuit with a BB  $= V_{bulk} = 0.1$  V and sweeping the different input time differences. The y-axis is the number of delay elements  $(n_e)$  before the second pulse overtake the first. Values along the x-axis are the simulated delay difference between the two pulses from the pulse generator. Starting with equation 5.13 the expression for  $n_e$  can be found to be  $\frac{\tau_i}{\Delta \tau}$ , and the effect of  $\tau_i$  and  $\Delta \tau$  is easily seen. Each column represents a PDF for that time difference  $\tau_i$ . In this example we see that almost all the measurements are at the same  $n_e$ , the black area. A few measurements indicates no registered measurement. Light area equals few or none registered measurements, while darker area indicates more measurements. The complete plot is called a grey-map.

The second grey-map indicates the behaviour as a function of the BB input. From this it is possible to se how  $n_e$  behaves as a function of BB. This can be compared to the measurements of the ring oscillator and the simulations of the  $\Delta \tau$ .

#### **Complete circuit**

Simulations of the entire circuit were done without Monte Carlo to reduce simulation time. This means only one data set is available for the simulation results which again make them somewhat less informative. All the eight time differences  $\tau_i$  were simulated and the BB input were stepped by 0.025 V from 0 V to 0.5 V. The result of this simulation is seen in figure 5.24, 5.25 and 5.26. In the 3D plot we see the expected tendency for the circuit.



Figure 5.23: Example of a grey-maps for the TDMC. Showing the PDFs for different inputs.

Increasing BB, larger  $\Delta \tau$ , results in fewer  $n_e$ , as seen by the declining slope from left to right. With less than 0.1 V as the BB input all the detectors are activated and the generated  $\tau_i$  is larger than the TDMC can detect resulting in buffer overflow. In figure 5.25 the effect of changing  $\tau_i$  is easily seen. For all the different BB inputs increasing  $\tau_i$  results in larger  $n_e$  as expected. It is also possible to see the reduced accuracy when increasing BB by the declining slope. In figure 5.26 the  $n_e$  behaves as expected from the simulations of the delay elements.

Figure 5.25 show that the circuit behave linearly except for the largest  $\tau_i$ , clearly seen for Vbulk = 0.15–0.2 V. One effect caused by the Body Biasing is increasing gate capacitance of the transistors. In figure 5.27 simulated input capacitance is plotted as a function of BB. The gate capacitance increase with increasing BB and will continue to increase until BB is approximately 0.225 V before it starts to decline. Both the tapped delay lines are buffered by an inverter as a last stage before entering the line. These inverters charge different loads, resulting in different time delays. This again means that the generated delay from the pulse generator will increase some. Simulations show that this increase is about 30 ps at the maximum BB = 0.225 V. When BB is increased past this point the capacitance is reduced and the extra time delay is also reduced. This is assumed to be the cause of the small dip found for the largest  $\tau_i$ , at Vbulk = 0.15–0.2 V

The gate capacitance was determined through simulations. Several delay elements was connected serially, and the input was an ideal signal generator charging the first gate through a 100 k $\Omega$  resistor. Simulating with differ-



Figure 5.24: 3D view of the simulation results for the TDMC.

ent BB inputs and measuring the time to reach 0.632 V on the gate gave the RC delay. Then solving t = RC for C give the simulated gate capacitance. The increasing gate capacitance is caused due to the reduced depletion region beneath the gate. By applying a positive BB on the bulk, the depletion regions will shrink around the source, drain and the channel beneath the gate will be shallower. Most of the gate capacitance lies between the gate through the gate oxide and channel to the bulk. A thinner channel will reduce the distance between the gate and the bulk and hence increase the capacitance. Why this effect declines after approximately 0.225 V is unknown. The BSIM4 models used for BB is already shown to be inaccurate at best, so these results should be further investigated, but are outside the scope of this thesis.

In table 5.7 three BB inputs with outputs are shown. Starting with a BB of 0.4 V ( $\Delta \tau = 9.00 \text{ ps}$ ) the lack of accuracy is seen in  $n_e$ . When the  $\tau_i$  increases there is almost no change in  $n_e$ . The first four  $\tau_i$  inputs result in almost no change of  $n_e$ . All four should result in an increased output by one, but the change only happens twice. When the BB is reduced to 0.3 V a larger change of  $n_e$  is seen. The first increase of  $\tau_i$  should result in an increase of  $n_e$  by two compared to the first, which we do not see. This can happen if the first two are close to the edge for the difference  $\Delta \tau$  samples. The first can be towards  $n_e = 6$  while the second are closer to  $n_e = 9$ . In measurements



Figure 5.25: Grey-map of simulations as a function of generated delay for the TDMC.



Figure 5.26: Grey-map of simulations as a function of BB for the TDMC.



Figure 5.27: The effect of BB on gate capacitance for the delay element used in the TDMC.

this would result in two grey areas in the grey-maps when noise effects the decision in either direction. The overall accuracy with 0.2 V BB seems to be around the simulated  $\Delta \tau$ .

With a BB of 0.2 V we see indications of even better accuracy. Here the first changes to  $\tau_i$  seems to result in changes to  $n_e$  as expected with  $\Delta \tau = 5.42$  ps. Correlation between the  $\Delta \tau$  and changes in  $\tau_i$  are better with smaller changes to  $\tau_i$  and indicates accuracy of around 5 ps!

All in all the circuit seems to perform well and should yield reasonable measurements.

|          | BB                                   | 0.4 V   | 0.3 V   | 0.2 V   |
|----------|--------------------------------------|---------|---------|---------|
| $	au_i$  | $\Delta \tau_i ackslash \Delta \tau$ | 9.00 ps | 7.45 ps | 5.42 ps |
| 126.8 ps | _                                    | 6       | 7       | 10      |
| 140.5 ps | 13.7 ps                              | 7       | 8       | 12      |
| 151.5 ps | 11 ps                                | 7       | 9       | 13      |
| 160.5 ps | 8.9 ps                               | 8       | 10      | 14      |
| 166.3 ps | 5.9 ps                               | 8       | 11      | 15      |
| 171.2 ps | 4.9 ps                               | 9       | 11      | 16      |

Table 5.7: A few simulation values from the TDMC.

## 5.4.5 Layout considerations

Significant efforts were used to get the time critical parts to match up regarding wire lengths and parasitic capacitance. Creating the eight buffers and pass gates in the pulse generator as identical as possible meant placing them in a fix grid and using the same length of wires to connect everything together. The last stage before entering the TDMC for both pulses were identical inverters placed to minimise any parasitics.

In the TDMC tiny inverters are used when tapping the signal from the delay lines. By doing this the added capacitance is kept at a minimum trying to reduce the added delay. This also meant that the signals are buffered one more time before driving the d-flip-flop inputs. The delay elements in the delay line are symmetrical around the detector placed between them. This was done to reduce any mismatch in the circuit created by parasitics.

# 5.5 TDMC measurements

In this section the measurements for the TDMC are presented.

#### 5.5.1 Measurement setup

A small Printed Circuit Board (PCB) was created so it was easy to test the circuit manually and from MATLAB using the General Purpose Interface Bus (GPIB) interface for the necessary instruments. The measurement setup consists of several voltage sources, an oscilloscope and a signal generator. The voltage sources were controlled from MATLAB together with readout from the oscilloscope making it possible to automate the measurements. By automating the measurements it was possible to do several hundred measurements on each chip so good estimates could be made

The measurements were done with the same inputs as the simulation of the complete circuit. BB was swept from 0 V to 0.5 V for each of the eight generated time differences  $\tau_i$ . The measurements are conducted on eight chips with 100 measurements or more for each data point.

#### 5.5.2 Measurements

During the initial measurements a problem with the output arose. The period of the output signal were not 160 µs as expected with an input clock of 200 kHz and the 32 bit long shift register. To separate these inconsistent measurements from the others, all measurements with a period other than

 $160\pm1$  µs are discarded. The result is a very low number of measurements for low values of BB. When a new and better signal generator were used to create the clock input to the shift register a drastic improvement was observed. Lack of clock buffering internally is suspected to be the cause. To determine how reliable the measurements are, figure 5.28 show in percent how many of the measurements gave a valid output. All the 1100 measurements for each data point is considered, and a BB of 0.2 V or higher will result in very reliable measurements. The plot show the average number of measurements obtained at a data point.



Figure 5.28: 3D view that show the number of valid measurements from the TDMC for all inputs.

Figure 5.29 show the mean measurements at each data point for all the chips. The behaviour is as expected,  $n_e$  increases with larger  $\tau_i$  and smaller  $\Delta \tau$ . When compared to the simulated results in figure 5.24, the surface of the measured results is smoother. The reason for this is the averaging of the measurements and the effect of noise on the measured  $n_e$ . The noise will result in small changes in the pulse ramp and cause the threshold to be crossed at slightly different times. With enough noise and two pulses passing each other between two delays the TDMC will detect different  $n_e$  and the average will yield a better estimation than one measurement or the median.

To better see the difference between the simulation and the mean measure-



Figure 5.29: 3D view that show the mean measurements from the TDMC.

ment  $n_{diff} = n_{e_s} - n_{e_m}$  is plotted in figure 5.30. It clearly show that the simulation deviates more from the measurements at lower BB, and have a slight increase in difference with higher  $\tau_i$ . but over all the difference is small and stable. The increase in error for lower BB is also seen in figure 5.9. The absolute difference between simulated and measured BB effect is larger for lower BB and hence results in larger difference between simulation and measurements.

In figure 5.31 the simulations, mean measurements and  $n_{diff}$  are plotted together. The measurements follow the simulations well, but detect slightly smaller  $n_e$ . This can be explained by the inaccurate BB modeling of the transistor models shown by the ring oscillator measurements provided earlier. Since the biased delay elements actually behave faster in reality than predicted by simulations, the number of elements before the slower pulse is overtaken is lower in measurements.

#### Process variations and noise

We will start by looking at the deviations caused by noise in the individual chips. Since several measurements were performed on each chip, the average of all these measurements, and the corresponding standard deviation ( $\sigma$ ) from this average from each of the considered chips can be found.



Figure 5.30: 3D view that show the difference between simulations and mean measurements for the TDMC.



Figure 5.31: 3D view that show the simulations, mean measurements and the mean error for the TDMC.

The exact causes of this noise is hard to determine, but most of it probably resides from noise in the pulse generator.

In figure 5.32 the resulting standard deviation for all chips has been averaged. It looks fairly random, but it is possible to see the tendency of higher standard deviation at lower BB and higher  $\tau_i$  as expected. The increased  $\sigma$  for larger  $\tau_i$  is caused by the reduced SNR. The increase in  $\tau_i$  is smaller for higher values than for lower values, and thus the noise is more prominent. A  $\sigma < 0.5\Delta\tau$  ( $n_e$ ) indicates a precision of about  $1\Delta\tau$  for 95% of occurrences (a confidence interval of 95%).



Figure 5.32: 3D view that show the average standard deviation of  $n_e$  from the TDMC, showing the effect of noise.

In figure 5.33, all the measurements are considered together. This plot therefore includes process variations in addition to just noise between individual measurements. The deviation is much higher here, meaning process variations contribute considerably compared to the noise in the individual chips. The deviation is actually almost four times  $\Delta \tau$ . The  $\tau_i$  of 166.4 ps stands out with higher  $\sigma$  than its neighbors. This is caused by the fact that one chip deviates from the rest at this  $\tau_i$ , as explained in following section.



Figure 5.33: 3D view that show the standard deviation of  $n_e$  for all inputs from the TDMC, including process variations.

## Grey-maps

Figures 5.34 and 5.36 are grey-maps of all the 1100 measurements, while figure 5.35 and 5.37 are the measurements of a single chip. Comparing the single chip with the sum of all chips it is clear that the circuit is susceptible to process variations. The single chip show unambiguous  $n_e$  measurements while the sum gives much more diverse results. Looking at the grey-maps with regard to BB the same tendency is seen.

For  $\tau_i$  at 166.4 ps and 171 ps it is clear that one chip deviates noticeably from the mean at each level. In figure 5.38 all chips are plottet together, and the two chips can be identified. These differences originate from the pulse generator and most likely the threshold of the buffer after the RC delay. This is the reason for the noticeable higher standard deviation at  $\tau_i = 166.4$  ps for all the measurements, figure 5.33.

The average results for a single chip are found in table 5.8. With a BB of 0.2 V it starts out with a jump of 3 which fits well with  $\Delta \tau$  and the change in  $\tau_i$ . The next changes in  $\tau_i$  do not result in the expected change in  $n_e$ . The jump from  $n_e = 11$  to  $n_e = 13$  when  $\tau_i$  changes with 5.9 ps also seems strange with that difference in delay. Increasing the BB to 0.3 V or 0.4 V does not change the lack of correlation between  $\Delta \tau$  and  $n_e$  as a function of



Figure 5.34: Grey-map of all the measurements from the TDMC as a function of generated delay.


Figure 5.35: Grey-map of a single chip as a function of generated delay.



Figure 5.36: Grey-map of all the measurements from the TDMC as a function of BB.



Figure 5.37: Grey-map of a single chip as a function of BB.



Figure 5.38: Mean measurements for all chips as a function of BB, with two standing out from the mean.

simulated  $\tau_i$ .

|          | BB                                    | 0.4 V   | 0.3 V   | 0.2 V   |
|----------|---------------------------------------|---------|---------|---------|
| $	au_i$  | $\Delta \tau_i \setminus \Delta \tau$ | 9.00 ps | 7.45 ps | 5.42 ps |
| 126.8 ps | _                                     | 4       | 5       | 7       |
| 140.5 ps | 13.7 ps                               | 5       | 6       | 10      |
| 151.5 ps | 11 ps                                 | 5       | 7       | 10      |
| 160.5 ps | 8.9 ps                                | 6       | 7       | 11      |
| 166.3 ps | 5.9 ps                                | 6.6     | 9       | 13      |
| 171.2 ps | 4.9 ps                                | 6.4     | 9       | 13      |

Table 5.8: Mean of measurements from the TDMC for an single chip.

#### 5.5.3 Summary

The idea of using body biasing to obtain relative time difference measurements is interesting because it allows post production tuning of the circuit. This is important, first of all because it means the range and accuracy can be changed in-field, but also because it can be used to compensate for process variations.

The measurements were promising and provided important observations. The pulse generator has some issues though, so the actual performance of the circuit in terms of accuracy is hard to determine. Measurements indicate that an accuracy of 5–10 ps can be achieved. The reader should

however keep in mind that the chip was produced to provide a proof of concept, in which we have succeeded.

When we look at the grey-maps of a single chip, we see a clear tendency in the behaviour of the circuit. For low bias voltages, the circuit fails. This is hardly surprising, since is only means  $\Delta \tau$  is too small to be detectable within the range of the implemented circuit. As the bias voltage is increased, the results become more interesting. In figure 5.35 we see a close to linear tendency in  $n_e$  as a function of  $\tau_i$  for bias voltages higher than approximately 100 mV.

As with the ring oscillator, the sample count is too low, from a statistical point of view, to base a conclusion upon. The measurements does however indicate that the circuit is vulnerable to process variations. When comparing the standard deviation between individual measurements on a single chip to the standard deviation of the total collection of samples, it is obvious that variations between the chips is prominent. A deviation of up to 4 times  $\Delta \tau$  is significant, and is an issue that needs attention in a future implementation.

Obtaining precision and accuracy in the order of a few millimetres using RF signals is hard to accomplish using traditional techniques, since it means measuring time differences in the order of a few picoseconds. The circuit described here solves this without the need for high precision clocks, and is implementable in all triple well CMOS processes. Although the implemented circuit has some issues, most of these resides in the pulse generator and not in the measurement circuitry. The measurements therefore provide a solid proof of concept, but before the exact precision and accuracy achievable with this scheme can be found, there are some practical issues that needs to be solved.

One of the challenges in future realisations is how to avoid the added delay when entering the delay line. Simulations estimated this extra delay to about 30 ps, which would add 9 mm when used in ranging. One solution to this problem can be to tune both the delay lines to reduce the difference in gate capacitance. Another is to use really large buffers as the last gate before the delay line. The idea of biasing both the delay lines is an interesting thought. This will reduce the effect of mismatch on both delay lines resulting in higher precision. The extra delay due to higher gate capacitance at the first gate is reduced and it will also be possible to create a very small  $\Delta \tau$ . Biasing both lines should be tried in any future work. The effect of reversed (negative) body bias should also be investigated.

### Chapter 6

# **Concluding remarks**

Different techniques for measuring distances using RF signals have existed for quite some time. For a new technology to be viable it has to offer something new over the already existing technologies. In wireless sensor nodes there are several constrains that needs to be considered, and many of the more traditional approaches to the ranging problem are not suited. To achieve high resolutions without exceeding these constrains, new solutions are therefore required.

It should be obvious that the best solution is the scheme yielding the highest resolution for the lowest cost. Transmission and reception of pulses consume energy, and the extra number of transmissions needed by the ranging should be minimized. It is possible to perform range estimation on already existent signals through the use of RSS. The accuracy of this technique is however in the order of several meters. Many of the potential applications for a WSN require far better resolutions than this.

Recently, UWB technology has received considerable attention due to its high bandwidth and the possibilities it offers to short range communication and ranging. A major issue with many of the proposed ranging schemes utilizing UWB is the complex receiver structures and the synchronization requirement. Active Echo exploits the properties of UWB-IR signals to achieve very high spatial resolution using only low complexity digital components. This is important because it can be implemented in virtually any micro or nanoscale process. Converting an already implemented version to a newer and finer pitched process can be accomplished with little effort, and since the resolution is directly dependent on gate delay the extra performance comes automatically for a faster process.

Although the Active Echo scheme use dedicated pulse transmissions, they are required if high accuracy and precision is desired. It is in the end a trade off between resolution and power consumption, and the extra transmissions required with Active Echo compared to RSS based approaches are justifiable when considering an improvement in accuracy of 2 orders of magnitude, or maybe even more. The achievable accuracy is to a first approximation only limited by the resolution of the TDMC, and by implementing this as suggested in the previous chapter with body biased delay elements, time resolution below 10 ps is achievable. This corresponds to a spatial accuracy of 1.5 mm!

To truly get a grasp of the achievable performance, the precision of the ranging system has to be considered as well. The analysis revealed that there are some potentially limiting factors on the precision. The delta error must be considered, and some sort of compensation is needed. NLOS multipath conditions is another case requiring attention. Minor reflections can be compensated for, while completely blocked signal paths are harder to work around.

The initial problem in this thesis was to find a solution to the ranging problem in WSN applications. The proposed scheme solves this problem, and should be easily implemented within the constrains of a WSN node. Compared to comparable alternatives, the hardware needed in Active Echo is minimal. Due to the limited time available with the work of this thesis, no working prototype was produced. The preliminary and methodical analysis presented in this thesis should however provide a foundation for a future implementation of the scheme. The proposed scheme is to our best knowledge a novel solution for WSN nodes, with at least two orders of magnitude of improvement in ranging precision and accuracy. All feasible in a standard digital CMOS technology, fully scalable and requiring little extra effort to implement.

#### 6.1 Future work

To properly verify the proposed scheme a prototype must be implemented, with working versions of an echo and a main device. There are some practical issues that needs to be solved first though.

A single delay line sampler spanning the entire range of the system, would have to be rather long. Consider for instance a delay line with a resolution of 40 ps( $\approx$  6.6 mm). With 1000 samplers in the delay line this corresponds to a range of 6 m. To extend the range without requiring additional delay elements, an initial delay can be implemented before the delay line. By adjusting this delay, the range can be swept. The tunable delay elements should come in handy here. This does of course require additional pulse transmissions and receptions. The smaller circuitry gained from this must

#### 6.1. FUTURE WORK

therefore be considered against the extra cost in terms of power dissipation from pulse transmissions.

If the full potential of the scheme is to be achievable, the delta error needs to be compensated for. This means some sort of offset correction must be added to the ToF estimation algorithm. There are several thinkable ways to achieve this. If some knowledge about the channel path loss is known, the received signal strength can be coarsely approximated using the already achieved but imprecise distance estimate. This way the precision can be improved somewhat.

Relying on the expected pathloss is however not a very good solution due to the uncertainty this introduce. Using the swept threshold sampler, the received signal can be accurately reconstructed for all *SNRs*. Knowing this, the nodes could estimate the noise and signal levels prior to the main ranging procedure, for instance using a training sequence of some sort.

Several of the potential applications for short range and high precision ranging schemes use highly NLOS affected channels. The ToF algorithms, in particular the Max selection algorithm, needs to be extended if they are to resolve NLOS conditions.

### Appendix A

### Layout

This section contain the layout diagrams of the implemented circuits.

### A.1 Ring Oscillator

Figure A.1 show the layout of the inverter used in the ring oscillator. Notice the N-well and the deep N-well surrounding the NMOS transistor. The ring oscillator is shown in figure A.2. Because the oscillator consist of 201 equal inverters, the mid section is left out. On top of the oscillator the output buffer can be seen.

### A.2 Time Difference Measuring Circuit

Figure A.3 show the delay element used in the TDMC. Notice the deep Nwell around the NMOS at the bottom and the large distance requirement between the two transistors due to this. The d-flip-flop used as a detector is seen in figure A.4. The somewhat awkward shape is from the reshaping to make it fit between two delay elements. In figure A.5 two delay elements, two inverters and a d-flip-flop are packed together to create a cell with both delay lines and the detector. This made it easy to adjust the number of elements in the tapped delay line. The RC-delays and the output buffer of the pulse generator is seen in figure A.6. The resistor runs as a snake and the different taps are visible.



Figure A.1: Layout of inverter used in the ring oscillator



Figure A.2: Complete layout of ringoscillator. To make the picture more comprehensable, the midsection of the structure has been left out since the entire ring oscillator consist of 201 equal inverters.



Figure A.3: Layout of the delay element implemented in the TDMC. The input is the left blue wire, and the right is the output. The PMOS-transistor is located at the top, while the NMOS-transistor in a p-well is located below.



Figure A.4: The d-flip-flop used as a detector. The inputs are seen as **Clk** and **M1**, and the output is cyan wire to the left of the lower half. The shape is chosen to fit between two delay elements.



Figure A.5: Two delay elements, two inverters and a d-flip-flop connected as a cell. These cells were stacked to create the delay line. The delay line inputs are on the left side, and the d-flip-flop output is the green wire running to the top.



Figure A.6: The RC-delay in the pulse generator. The output inverters and pass gates are seen at the top. The two pulses are entering this part in the two blue wires at the bottom, and the delayed pulses leave at the two inverters at the top. The signals from the decoder are entering from right at the top. The thick lines are the supply lines.

### Appendix **B**

## **PCB** and Measurement setup

### **B.1** Measurement setup

Pictures of the measurement setup of the two circuits are included in figures B.1 and B.2. The ring oscillator PCB in figure B.1 is fairly straight forward, consisting only of decoupling capacitors and inputs and outputs. The TDMC PCB in figure B.2 use push button switches and schmitt triggers as inputs to the pulse generator. Both PCBs are connected to an oscilloscope through the shown coaxial cable connector.



Figure B.1: Picture of the measurement setup for the ring oscillator



Figure B.2: Picture of the measurement setup for the TDMC

# **List of Figures**

| 2.1  | Signal paths in a) a LOS environment and b) a NLOS environment                                                                                       | 7  |
|------|------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 2.2  | Top level schematic of template mixing receiver                                                                                                      | 9  |
| 2.3  | Top level schematic of the thresholded receiver with duals-<br>lope pulse detection                                                                  | 10 |
| 2.4  | Top level schematic of the single threshold receiver topology considered in this thesis                                                              | 10 |
| 2.5  | Plot of the PDF (a) and CDF (b) of the received noise with $\sigma_n = 1$                                                                            | 12 |
| 2.6  | Shows the PDF of a noisy input in the absence of a signal with $\sigma_n = 1$ and $\hat{\theta}_t = 1$ . The colored area represents the $P_{fa}$ .  | 15 |
| 2.7  | Shows how $P_{fa}$ decrease for increasing values of $\hat{\theta}$                                                                                  | 16 |
| 2.8  | Shows PDFs of input signals with different SNRs. The area to the left of the threshold is the probability of the quantizer missing an incoming pulse | 16 |
| 2.9  | $P_{md}$ versus threshold for different SNRs, with $\hat{\theta}$ along the x-axis                                                                   | 17 |
| 2.10 | $P_{fa}$ versus $P_{md}$ . Each SNR has its optimal threshold setting, with the lowest total error rate                                              | 18 |
| 2.11 | Implementation of strobed sampler. The sampler is imple-<br>mented as a D-Flip-Flop                                                                  | 19 |
| 2.12 | The delay line sampler                                                                                                                               | 19 |
| 2.13 | Shows the integrating sampler. The counters are one way counters with a clock enable input                                                           | 20 |
| 3.1  | Concept sketch of a radar system                                                                                                                     | 27 |
| 3.2  | System concept of the Active Echo scheme                                                                                                             | 27 |

| 3.3 | Implementing the two devices as a part of the PHY-Layer<br>of a WSN node. The front end consist of filter and ampli-<br>fier functions. A quantizer and pulse generator can also be<br>shared as part of the front end                     | 28 |
|-----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 3.4 | Top level implementation of echo device. The flip-flop and delay element are standard digital building blocks, and the front end (not shown), quantizer and pulse generator can be shared with other parts of the node.                    | 29 |
| 3.5 | Top-level implementation of main device. The TDMC is shown<br>as a simple delay line sampler, but could also be implemented<br>as an integrating sampler or as the body biased TDMC de-<br>scribed in a later chapter.                     | 31 |
| 3.6 | Receiver schematic of Two-Way Ranging scheme. The figure<br>is copied from [Lee 02] and included to illustrate the com-<br>plexity level of the receiver.                                                                                  | 32 |
| 3.7 | The different parts of the Active Echo hardware and their respective time domains                                                                                                                                                          | 33 |
| 4.1 | Transmitted waveform and example waveforms seen by re-<br>ceivers at 1-20 m. Note that the vertical scale is different on<br>the individual plots.                                                                                         | 37 |
| 4.2 | Sweep of threshold and time                                                                                                                                                                                                                | 38 |
| 4.3 | Failure probability, zoomed                                                                                                                                                                                                                | 39 |
| 4.4 | Probability of a pulse passing the quantizer without detec-<br>tion as a function of threshold and distance, plotted as actual<br>meters to make the plot more understandable.                                                             | 39 |
| 4.5 | Probability of the quantizer not detecting a pulse as a function of increased distance with $\theta_t = [2,3,4]$                                                                                                                           | 40 |
| 4.6 | Shows how the delta error is manifested for high and low $SNR_p$ s. The arrows indicate the time of the threshold crossing.                                                                                                                | 41 |
| 4.7 | Example of a gray-map of the sampler states with $SNR_p = 7 \text{ dB}(10 \text{ m})$ . The mean value for each sampler is shown with a solid red line and the standard deviation is shown with a stapled line. $\theta_E D$ is set to two | 44 |
| 4.8 | Grey-maps of the sampler states for a received signal with $SNR_p = 3.5 \text{ dB}(15 \text{ m}) \dots \dots$                                          | 45 |
| 4.9 | Grey-map showing the output of a clipping sampler with input $SNR_p = 17.5 \text{ dB}(3 \text{ m}) \dots \dots \dots \dots \dots \dots \dots$                                                                                              | 45 |
|     |                                                                                                                                                                                                                                            |    |

| 4.10 | Plot taken from [Stoc 00] showing the transmitted information against $\sigma_n$ for $N = 64$                                                         | 46 |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 4.11 | Simulated RMS error in estimated ToF for increasing SNR and N from 10 to 10 000.                                                                      | 5( |
| 4.12 | The gaussian pulse and its first 15 derivates. The pulse shape considered in this thesis is highlighted.                                              | 51 |
| 4.13 | Shows how the delta error is seen in the main device. The solid and stapled pulses illustrate the variance in arrival time caused by the delta error. | 52 |
| 5.1  | Typical ring oscillator with 5 stages                                                                                                                 | 59 |
| 5.2  | Schematics of TDMC with capacitors as analog integrators .                                                                                            | 59 |
| 5.3  | Top level of the analog multiplexer                                                                                                                   | 60 |
| 5.4  | Sketch showing the cross section of an NMOS transistor placed in a deep nwell                                                                         | 61 |
| 5.5  | Simulated delay versus body bias, normalized to peak delay at zero bias                                                                               | 65 |
| 5.6  | Sketch showing the cross section of the implemented inverter                                                                                          | 66 |
| 5.7  | Measured waveform at the output of the ring oscillator $\ldots$                                                                                       | 67 |
| 5.8  | Measured frequency on the y-axis and BB voltage on the NMOS transistor along the x-axis                                                               | 67 |
| 5.9  | Average gate delay for measured and simulated behavior versus BB on the NMOS transistor                                                               | 68 |
| 5.10 | Current drawn by chip for different bias voltages. Notice the steep increase in current as the voltage pass the diode voltage of approximately 0.65 V | 7  |
| 5.11 | Comparison of simulation and measurement data, both curves normalized to their respective peak delay                                                  | 72 |
| 5.12 | Standard deviation from measured delay in seconds, based on measurements on 18 different dies.                                                        | 73 |
| 5.13 | Linear fit to standard deviation in measurements                                                                                                      | 73 |
| 5.14 | Linear fit to measured frequency                                                                                                                      | 75 |
| 5.15 | Second degree polynomial fit to measured frequency                                                                                                    | 75 |
| 5.16 | Relative delay difference in parallel chains                                                                                                          | 72 |
| 5.17 | Show the implemented TDMC with detectors                                                                                                              | 78 |

| 5.18 | Schematic for complete circuit.                                                                               | 79  |
|------|---------------------------------------------------------------------------------------------------------------|-----|
| 5.19 | Timing diagram for the implemented TDMC test circuit                                                          | 80  |
| 5.20 | Schematic of the delay element with sizes                                                                     | 82  |
| 5.21 | Simulation results for the pulse generator                                                                    | 84  |
| 5.22 | Simulation results for the delay element used in the TDMC.                                                    | 86  |
| 5.23 | Example of a grey-maps for the TDMC. Showing the PDFs for different inputs.                                   | 87  |
| 5.24 | 3D view of the simulation results for the TDMC. $\ldots$ .                                                    | 88  |
| 5.25 | Grey-map of simulations as a function of generated delay for the TDMC.                                        | 89  |
| 5.26 | Grey-map of simulations as a function of BB for the TDMC.                                                     | 90  |
| 5.27 | The effect of BB on gate capacitance for the delay element used in the TDMC.                                  | 91  |
| 5.28 | 3D view that show the number of valid measurements from the TDMC for all inputs.                              | 93  |
| 5.29 | 3D view that show the mean measurements from the TDMC.                                                        | 94  |
| 5.30 | 3D view that show the difference between simulations and mean measurements for the TDMC.                      | 95  |
| 5.31 | 3D view that show the simulations, mean measurements and the mean error for the TDMC                          | 95  |
| 5.32 | 3D view that show the average standard deviation of $n_e$ from the TDMC, showing the effect of noise.         | 96  |
| 5.33 | 3D view that show the standard deviation of $n_e$ for all inputs from the TDMC, including process variations. | 97  |
| 5.34 | Grey-map of all the measurements from the TDMC as a func-<br>tion of generated delay.                         | 98  |
| 5.35 | Grey-map of a single chip as a function of generated delay                                                    | 99  |
| 5.36 | Grey-map of all the measurements from the TDMC as a func-<br>tion of BB.                                      | 100 |
| 5.37 | Grey-map of a single chip as a function of BB                                                                 | 101 |
| 5.38 | Mean measurements for all chips as a function of BB, with two standing out from the mean.                     | 102 |
| A.1  | Layout of inverter used in the ring oscillator                                                                | 110 |

| A.2 | Complete layout of ringoscillator. To make the picture more<br>comprehensable, the midsection of the structure has been<br>left out since the entire ring oscillator consist of 201 equal<br>inverters.                                                                                                                                                          | 111        |
|-----|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|
| A.3 | Layout of the delay element implemented in the TDMC. The input is the left blue wire, and the right is the output. The PMOS-transistor is located at the top, while the NMOS-transist in a p-well is located below.                                                                                                                                              | tor<br>112 |
| A.4 | The d-flip-flop used as a detector. The inputs are seen as <b>Clk</b> and <b>M1</b> , and the output is cyan wire to the left of the lower half. The shape is chosen to fit between two delay elements.                                                                                                                                                          | 113        |
| A.5 | Two delay elements, two inverters and a d-flip-flop connected<br>as a cell. These cells were stacked to create the delay line.<br>The delay line inputs are on the left side, and the d-flip-flop<br>output is the green wire running to the top                                                                                                                 | 114        |
| A.6 | The RC-delay in the pulse generator. The output inverters<br>and pass gates are seen at the top. The two pulses are en-<br>tering this part in the two blue wires at the bottom, and the<br>delayed pulses leave at the two inverters at the top. The sig-<br>nals from the decoder are entering from right at the top. The<br>thick lines are the supply lines. | 115        |
| B.1 | Picture of the measurement setup for the ring oscillator                                                                                                                                                                                                                                                                                                         | 118        |
| B.2 | Picture of the measurement setup for the TDMC                                                                                                                                                                                                                                                                                                                    | 119        |

#### LIST OF FIGURES

### Acronyms

**ADC** Analog to Digital Converter

AoA Angle Of Arrival

ASIC Application Specific Integrated Circuit

AWGN Additive White Gaussian Noise

**BB** Body Bias

CAD Computer Aided Design

**CDF** Cumulative Distribution Function

CMOS Complementary Metal-Oxide Semiconductor

CTQA Continuous Time Quantized Amplitude

**DSP** Digital Signal Processor

**DIBL** Drain Induced Barrier Lowering

FBB Forward Body Bias

**GPS** Global Positioning System

**GPIB** General Purpose Interface Bus

**IC** Integrated Circuit

LNA Low-Noise Amplifier

LOS Line of Sight

MCU Micro Controller Unit

MEMS Micro Electro Mechanical Systems

MOSFET Metal-Oxide-Semiconductor Field-Effect Transistor

#### ACRONYMS

- 128
- MUI Multi User Interference
- NMOS N-channel MOSFET
- NLOS Non Line Of Sight
- PCB Printed Circuit Board
- **PRF** Pulse Repetition Frequency
- PMOS P-channel MOSFET
- **PSD** Power Spectral Density
- **PDF** Probability Density Function
- **RSS** Received Signal Strength
- **RBB** Reverse Body Bias
- **RF** Radio Frequency
- SGA Standard Gaussian Approximation
- SNR Signal-to-Noise Ratio
- SR Stochastic Resonance
- SSR Suprathreshold Stochastic Resonance
- **TDMC** Time Difference Measuring Circuit
- **TDoA** Time Difference Of Arrival
- ToA Time of Arrival
- **ToF** Time of Flight
- **UWB** Ultra-Wide Band
- UWB-IR Ultra-Wide Band Impulse Radio
- VCO Voltage-Controlled Oscillator
- WSN Wireless Sensor Network
- WPAN Wireless Personal Area Network

# Bibliography

- [Abas 04] M. A. Abas, G. Russel, and D. Kinniment. "Design of sub-10picoseconds on-chip time measurement circuit". In: Design, Automation and Test in Europe Conference and Exhibition. Proceedings, February 2004.
- [Bene 04] M.-G. D. Benedetto and G. Giancola. *Understanding Ultra Wide Band Radio Fundamentals*. Prentice Hall PTR, 1 Ed., 2004.
- [Brya 01] A. Bryant *et al.* "Low-Power CMOS at Vdd=4kT/q". In: *Device Research Conference*, pp. 22–23, jun 2001.
- [Buch 98] M. Bucher et al. "The EPFL-EKV MOSFET Model Equations for Simulation". Tech. Rep., Electronics Laboratories, Swiss Federal Institute of Technology (EPFL), jul 1998.
- [Fang 75] F. F. Fang and H. S. Rupprecht. "High performance MOS integrated circuit using the ion implantation technique". *IEEE Journal of Solid-State Circuits*, Vol. sc-10, No. 4, pp. 205–211, Aug 1975.
- [Forb 73] L. Forbes. "N-channel ion-implanted enhancement/depletion FET circuit and fabrication technology". IEEE Journal of Solid-State Circuits, Vol. sc-8, pp. 226–230, Jun 1973.
- [Frii 46] H. T. Friis. "A Note on a Simple Transmission Formula". In: Proceedings of Institute of Radio Engineers, pp. 254–256, may 1946.
- [Gezi 05] S. Gezici *et al.* "A Two-Step Time of Arrival Estimation algorithm for Impulse Radio Ultra Wide Band Systems". In: *Proceedings of European Signal Processing Conference (EUSPICO)*, Mitsubishi Electric Research Laboratories, September 2005.
- [Gran 06] K. Granhaug et al. "Body-bias Regulator for Ultra Low Power Multifunction CMOS Gates". In: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1255–1258, may 2006.

- [Hjor 06] H. A. Hjortland. *UWB impulse radar in 90nm CMOS*. Master's thesis, Department of Informatics, University of Oslo, 2006.
- [Hu 03] C. Hu et al. BSIM4.3.0 MOSFET Model User's Manual. University of California, Department of Electrical Engineering and Computer Sciences, 2003.
- [John 97] D. Johns and K. Martin. Analog Integrated Circuit Design. John Wiley & Sons, 1997.
- [Kesh 01] A. Keshavarzi *et al.* "Effectiveness of Reverse Body Bias for Leakage Control in Scaled Dual Vt CMOS ICs". International Symposium on Low Power Electronics and Design, pp. 207 – 212, aug 2001.
- [Khal 06] D. Khalil et al. "Optimum Sizing of Power Grids for IR Drop". In: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), pp. 481–484, may 2006.
- [Lee 02] J.-Y. Lee and R. A. Scholtz. "Ranging in a dense Multipath Environment Using an UWB Radio Link". *IEEE Journal On Selected Areas In Communication*, Vol. 20, pp. 1677–1683, December 2002.
- [Limb 05] C. Limbodahl. A spatial RAKE Receiver for Real-Time UWB-IR Applicatins. Master's thesis, University of Oslo, 2005.
- [Liu 00] X. Liu and S. Mourad. "Performance of Submicron CMOS Devices and Gates with Substrate Biasing". In: *IEEE International Symposium on Circuits and Systems (ISCAS)*, pp. 9–12, may 2000.
- [Liu 93] Z.-H. Liu *et al.* "Threshold Voltage Model for Deep-Submicrometer MOSFET". *IEEE Transactions on electron devices*, Vol. 40, pp. 86–95, jan 1993.
- [Meis 05] K. Meisal. CMOS Ultra Wide-Band Impulse Radio Receiver Front-End. Master's thesis, University of Oslo, Department of Informatics, 2005.
- [Mele 04] L. A. Melek et al. "Body-Bias Compensation Technique for Sub-Threshold CMOS Static Logic Gates". In: Symposium on Integrated Circuits and Systems Design (SBCCI), pp. 267–272, sep 2004.
- [Moli 04] A. F. Molisch *et al.* "IEEE 802.15.4a channel model Final Report". Tech. Rep., IEEE, November 2004.
- [Nare 03] S. Narendra *et al.* "Forward Body Bias For Microprocessors on 130-nm technology generation and beyond". *IEEE Journal of Solid-State Circuits*, Vol. 38, No. 5, pp. 696–701, May 2003.

- [Olaf 07] H. K. Olafsen. Wireless Sensor Network Localization Strategies. Master's thesis, Department of Informatics, University of Oslo, 2007.
- [Oowa 98] Y. Oowaki et al. "A Sub-0.1um Circuit Design with Substrateover-Biasing". In: Digest of Technical Papers, IEEE International Solid-State Circuits Conference (ISSCC), pp. 88–89, feb 1998.
- [Oppe 04] I. Opperman *et al. UWB Theory and Applications*. John Wiley & Sons Ltd, 2004.
- [Rice 66] J. R. Rice. "First-occurence time of high-level crossings in a continuous random process". J. Acoustic Society of America, Vol. 39, No. 2, pp. 323–335, feb 1966.
- [Sasa 82] N. Sasaki. "Higher harmonics generation in CMOS/SOS Ring Oscillators". IEEE transactions on Electron Devices, Vol. 29, pp. 280–283, Feb 1982.
- [Sche 00] B. Scheers *et al.* "Time-domain simulation and characterisation of TEM horns using a normalised impulse response". In: *IEE Proc.-Microw. Antennas Propag.*, pp. 463–468, dec 2000.
- [Seta 95] K. Seta et al. "50% Active-power saving without speed degradation using standby power reduction (SPR) circuit". In: Digest of Technical Papers, IEEE International Solid-State Circuits Conference (ISSCC), pp. 318–319, feb 1995.
- [Shoc 52] W. Shockley. "A unipolar "Field effect" Transistor". In: *Proceed*ings of the Institute of Radio Engineers, pp. 1365–1376, Nov 1952.
- [Siwi 04] K. Siwiak and D. McKeown. *Ultra-wideband radio technology*. Wiley, 2004.
- [Stoc 00] N. G. Stocks. "Suprathreshold Stochastic Resonance in Multilevel Threshold Systems". *Physical Review Letters*, Vol. 84, No. 11, pp. 2310–2314, Mar 2000.
- [Taub 05] D. Taubenheim *et al.* "Distributed Radiolocation Hardware Core for IEEE 802.15.4". Tech. Rep., Motorola, 2005.
- [Wann 00] C. Wann, J. Harrington, R. Mih, and S. Biesemans. "CMOS with active well bias for low-power and RF/analog applications". In: Symposium on VLSI Technology, Digest of Technical Papers, pp. 158–159, jun 2000.
- [West 94] N. H. E. Weste and K. Eshraghian. Principles of CMOS VLSI Design, A systems Perspective. Addison-Wesley-Longman, 2 Ed., 1994.

[yang 04] L. yang. "Ultra-wideband Communication An idea Whose Time Has Come". *IEEE Signal Processing Magazine*, pp. 26–54, 2004.