32  Measuring Divergence of Investor Opinion

A foundational question in financial economics concerns how differences in investor beliefs affect asset prices and trading activity. In markets where investors hold heterogeneous expectations about a firm’s future cash flows, the aggregation of these divergent views into a single market price becomes a non-trivial exercise with profound implications for asset valuation, return predictability, and market efficiency. The concept of divergence of investor opinion (hereafter DIVOP) has emerged as a central construct in both the accounting and finance literatures, serving as a lens through which researchers examine the information environment of firms, the dynamics of uncertainty resolution, and the nature of market reactions to news.

The theoretical foundations of the DIVOP literature trace back to Miller (1977), who proposed that when investors disagree about the value of a security and short-sale constraints prevent pessimistic investors from fully expressing their views, the market price will reflect the valuation of the most optimistic investors. This leads to systematic overpricing that is increasing in the degree of opinion divergence. The overpricing persists until information events, such as earnings announcements, reduce disagreement and prices converge toward fundamental values (Berkman et al. 2009). Varian (1985) offers an alternative perspective in which divergence of opinion represents an additional risk factor, leading to higher rather than lower expected returns, creating a theoretical tension that has motivated extensive empirical investigation.

The empirical literature on DIVOP has expanded considerably since these seminal contributions. Researchers have documented that divergence of opinion helps explain a range of asset pricing anomalies, including post-earnings announcement drift (Garfinkel and Sokobin 2006; K. L. Anderson, Harris, and So 2007), the cross-sectional return difference between value and growth stocks (Doukas, Kim, and Pantzalis 2004), short- and long-run post-IPO returns (Houge et al. 2001), pre- and post-acquisition stock returns (Alexandridis, Antoniou, and Petmezas 2007), takeover premia (Chatterjee, John, and Yan 2012), and the broad cross-section of stock returns (Diether, Malloy, and Scherbina 2002; Doukas, Kim, and Pantzalis 2006). The explanatory power of DIVOP has been demonstrated using a rich set of empirical proxies, ranging from analyst forecast dispersion and abnormal trading volume to bid-ask spreads and idiosyncratic volatility.

Despite the maturity of the DIVOP literature in developed markets, particularly the United States, its application to emerging markets remains remarkably thin. This gap is especially notable given that the theoretical conditions under which divergence of opinion matters most (namely, binding short-sale constraints, information asymmetry, and heterogeneous investor sophistication) are arguably more prevalent in emerging markets than in their developed counterparts. The Vietnamese equity market presents a compelling laboratory for studying investor disagreement. The market is characterized by several features that amplify the relevance of the DIVOP framework:

  1. Binding short-sale constraints. Short selling was not permitted in Vietnam until January 2025, and even after its introduction, the mechanism remains restricted to a limited set of securities with significant regulatory constraints on execution. This closely mirrors the theoretical setting of Miller (1977), where pessimistic investors are unable to fully express their views through short positions.

  2. Dominance of retail investors. Individual investors account for approximately 80-85% of daily trading volume on HOSE and HNX, compared to roughly 25% in the United States. Retail investors are more susceptible to behavioral biases, sentiment-driven trading, and information processing limitations that naturally give rise to heterogeneous beliefs (Phan et al. 2023).

  3. Information asymmetry and transparency challenges. Despite improvements in disclosure standards, Vietnam’s regulatory framework for corporate reporting remains less stringent than those in developed markets. Selective disclosure, delayed filing of financial statements, and limited enforcement of insider trading regulations create an environment in which investors operate with substantially different information sets (Vo and Phan 2017).

  4. Foreign ownership limits. Caps on foreign ownership (currently 49% for most sectors, with exceptions) create a segmented market where domestic and foreign investors may hold systematically different views about firm value, amplifying the divergence of opinion.

  5. Thin analyst coverage. Whereas a typical S&P 500 firm is followed by 15-25 sell-side analysts, coverage of Vietnamese equities is concentrated among a relatively small number of domestic brokerages and a handful of international research houses. This limits the informativeness of traditional analyst-based DIVOP measures and necessitates greater reliance on market-based proxies.

This chapter provides a methodology for constructing multiple proxies for divergence of investor opinion adapted to the institutional characteristics of the Vietnamese market. We draw on the methodological frameworks established by Garfinkel (2009) and Diether, Malloy, and Scherbina (2002), while introducing modifications that account for the microstructure of Vietnamese exchanges, the \(T+2\) settlement cycle, the absence (until recently) of short selling, and the availability of data through domestic financial platforms. Specifically, we construct and analyze the following DIVOP proxies:

For each proxy, we describe the theoretical motivation, the data requirements, the construction methodology adapted for Vietnamese data, the empirical properties observed in the Vietnamese cross-section, and the practical considerations that researchers should bear in mind when employing these measures. We pay particular attention to issues that are specific to emerging markets, including thin trading, corporate action adjustments, exchange-specific microstructure effects, and the interplay between foreign ownership constraints and measures of investor disagreement.


33 Theoretical Framework

33.1 The Miller (1977) Overpricing Hypothesis

The canonical model of divergence of opinion and asset pricing begins with Miller (1977). Miller’s central insight is simple: in a market where investors hold heterogeneous beliefs about the future payoffs of a risky asset and short-sale constraints prevent some investors from acting on their pessimistic views, the equilibrium price will be set by the subset of investors who are most optimistic about the asset’s value. The severity of overpricing is increasing in both the degree of opinion divergence and the stringency of short-sale constraints. Formally, if investor \(i\) assigns a valuation \(V_i\) to a security, the market price \(P\) satisfies:

\[ P = E[V_i \mid V_i \geq V^*] \]

where \(V^*\) is the marginal investor’s valuation, which exceeds the unconditional mean valuation \(E[V_i]\) whenever short-sale constraints bind for some investors. The degree of overpricing is:

\[ \text{Overpricing} = P - E[V_i] = E[V_i \mid V_i \geq V^*] - E[V_i] \]

which is positive and increasing in the dispersion of the distribution of \(V_i\) (i.e., divergence of opinion) and in \(V^*\) (i.e., the severity of short-sale constraints).

Miller’s model generates several testable predictions:

  • Cross-sectional prediction: Stocks with higher divergence of opinion should have lower subsequent returns as prices gradually correct toward fundamental values.
  • Time-series prediction: Information events that reduce disagreement (e.g., earnings announcements) should be associated with negative abnormal returns for high-DIVOP stocks, as the “optimism premium” dissipates.
  • Interaction prediction: The overpricing effect should be strongest among stocks that simultaneously exhibit high divergence of opinion and binding short-sale constraints.

33.2 Alternative Theoretical Perspectives

Varian (1985) proposes an alternative framework in which divergence of opinion acts as a risk factor. If investors are risk-averse and disagreement represents genuine uncertainty about future payoffs, then higher dispersion of beliefs should be associated with higher expected returns as compensation for bearing the additional risk. This creates a sharp empirical dichotomy: the Miller hypothesis predicts a negative DIVOP-return relation, whereas the Varian model predicts a positive relation.

The distinction between these theories hinges critically on the market microstructure and institutional setting (@tbl-divop-theories).

Table 33.1: Summary of theoretical predictions for the DIVOP-return relation under different assumptions
Theoretical Framework Short-Sale Constraints DIVOP-Return Relation Key Mechanism
Miller (1977) Binding Negative Optimistic bias in price
Varian (1985) Non-binding Positive Risk premium for uncertainty
Hong and Stein (2003) Binding, gradual info Negative, time-varying Slow diffusion of bearish views
Scheinkman and Xiong (2003) Binding, overconfidence Negative Speculative bubble premium

Hong and Stein (2003) extend Miller’s framework by incorporating gradual information diffusion. In their model, bearish information is impounded into prices more slowly than bullish information because short-sale constraints raise the cost of acting on negative views. This generates momentum-like patterns in which high-DIVOP stocks exhibit positive short-run returns (as optimists push prices up) followed by negative long-run returns (as bearish information eventually reaches the market).

Scheinkman and Xiong (2003) introduce an additional dimension by noting that when investors are overconfident about their private signals and short-sale constraints bind, stock prices contain a “speculative bubble” component that reflects the option value of reselling the asset to a future investor who may be even more optimistic. This model predicts that both high trading volume and high price volatility should be associated with overpricing, providing a theoretical basis for using volume-based and volatility-based DIVOP proxies.

33.3 Relevance to the Vietnamese Market

The Vietnamese equity market provides an unusually clean setting for testing the Miller hypothesis. Vietnam’s equity market operated without any short-selling mechanism from its inception in 2000 through January 2025, which was a full quarter-century in which the first necessary condition of Miller’s model (binding short-sale constraints) was satisfied by regulation rather than by market frictions. Even after the introduction of covered short selling in 2025, the mechanism remains restricted to securities meeting specific liquidity and market capitalization thresholds, and the regulatory environment imposes borrowing requirements that significantly raise the cost of shorting relative to developed markets.

The dominance of retail investors amplifies the second necessary condition (i.e., heterogeneous beliefs). Research on the Vietnamese market has documented significant herding behavior (Vo and Phan 2017; Vo 2015), sentiment-driven trading (Phan et al. 2023; Nguyen and Pham 2018), and information asymmetry between domestic and foreign investors (Vo 2017). These behavioral characteristics naturally generate wider dispersion of investor valuations compared to markets dominated by institutional investors with access to similar analytical frameworks and information sources.

Table 33.2 compares key institutional features relevant to the DIVOP framework between Vietnam and the United States.

Table 33.2: Institutional comparison of Vietnam and the United States relevant to divergence of opinion
Feature Vietnam (HOSE/HNX) United States (NYSE/NASDAQ)
Short selling Introduced Jan 2025 (limited) Permitted (Reg SHO since 2005)
Retail investor share of volume ~80-85% ~25%
Settlement cycle T+2 (T+1 planned for 2026) T+1 (since May 2024)
Daily price limits \(\pm\) 7% (HOSE), \(\pm\) 10% (HNX) None
Foreign ownership cap 49% (most sectors) None
Average analyst coverage (VN30) 5-10 analysts 15-25 analysts
Mandatory quarterly reporting Yes (since 2012) Yes
Options/derivatives market VN30 Index Futures (since 2017) Extensive options/futures

The presence of daily price limits (\(\pm\) 7% on HOSE and \(\pm\) 10% on HNX) creates an additional mechanism through which divergence of opinion can be amplified. When a stock hits its price limit, investors who wish to trade in the direction of the limit are unable to do so, leading to accumulated unfilled orders and delayed price discovery. This institutional feature may create short-term spikes in measured DIVOP that reflect limit-induced friction rather than genuine disagreement. We address this issue in our empirical methodology by flagging limit-hit days and conducting robustness checks that exclude these observations.

34 Data Sources and Sample Construction

34.1 Data Sources

The construction of DIVOP proxies for the Vietnamese market requires daily stock-level trading data and, for the analyst dispersion measures, individual analyst forecast data. We source all data from DataCore.vn, which provides coverage of all securities listed on HOSE, HNX, and the UPCoM (Unlisted Public Company Market) exchange. Table 34.1 summarizes the datasets and key variables used in this study.

Table 34.1: Data sources and key variables for DIVOP proxy construction
Dataset Key Variables Frequency
Daily Stock Trading Close price, high, low, open, volume, shares outstanding, adjusted price, bid, ask Daily
Corporate Actions Dividends, stock splits, bonus issues, rights offerings Event-based
Company Information Exchange code, industry classification (ICB), listing date, delisting date Static/Periodic
Analyst Forecasts Individual analyst EPS forecasts, announcement dates, fiscal period end, analyst ID, broker name Per estimate
Market Index VN-Index daily returns, VN30 returns, HNX-Index returns Daily
Foreign Ownership Foreign buy/sell volume, foreign ownership percentage, remaining foreign room Daily

34.2 Sample Construction

We construct our sample using the following filters, applied sequentially:

import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from sklearn.linear_model import LinearRegression
from scipy import stats as scipy_stats
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# =============================================================================
# Configuration Parameters
# =============================================================================
# Users can modify these parameters to adjust the methodology
CONFIG = {
    # Sample period
    'beg_date': '2007-01-01',
    'end_date': '2024-12-31',
    
    # Estimation windows (in trading days)
    'est_window': 60,          # Rolling window for SUV and volatility
    'detrend_window': 180,     # Window for DTO detrending median
    'lag': 7,                  # Lag for DTO detrending
    'gap': 5,                  # Gap between estimation period and event date
    
    # Filters
    'min_price': 1000,         # Minimum price in VND
    'min_volume_days': 0.8,    # Min fraction of non-zero volume days in window
    'min_analysts': 3,         # Minimum number of analysts for DISP
    'max_spread_pct': 0.50,    # Maximum bid-ask spread as fraction of midpoint
    'forecast_carry_days': 105,# Days to carry forward stale analyst forecasts
    
    # Exchange identifiers
    'exchanges': ['HOSE', 'HNX'],
    
    # Price limit thresholds (for flagging)
    'price_limit_hose': 0.07,
    'price_limit_hnx': 0.10,
}

print("Configuration parameters loaded successfully.")
print(f"Sample period: {CONFIG['beg_date']} to {CONFIG['end_date']}")
print(f"Estimation window: {CONFIG['est_window']} trading days")
print(f"Detrending window: {CONFIG['detrend_window']} trading days")
Configuration parameters loaded successfully.
Sample period: 2007-01-01 to 2024-12-31
Estimation window: 60 trading days
Detrending window: 180 trading days

The sample universe includes all common stocks (ordinary shares) listed on HOSE and HNX during the period January 2007 through December 2024. We begin in 2007 rather than at market inception (2000 for HOSE, 2005 for HNX) for two reasons. First, the early years of the Vietnamese market were characterized by an extremely small number of listed firms (fewer than 30 on HOSE through 2005), making cross-sectional analysis unreliable. Second, data quality and consistency improve substantially after the market expansion of 2006-2007, during which the number of listed firms on HOSE grew from approximately 40 to over 100.

We apply the following filters to construct the analysis sample:

  1. Security type filter. We retain only common stocks (ordinary shares), excluding preferred shares, exchange-traded funds (ETFs), covered warrants, and certificates of deposit. This is analogous to the standard filter in the U.S. literature that restricts to CRSP share codes 10 and 11.

  2. Exchange filter. We include stocks listed on HOSE and HNX but exclude UPCoM securities in our baseline analysis. UPCoM is a registration-based trading venue with less stringent listing requirements and substantially lower liquidity, which may introduce noise into volume-based and spread-based measures. We include UPCoM in robustness checks.

  3. Price filter. We exclude stock-day observations with closing prices below 1,000 VND. This threshold serves the same purpose as the “penny stock” exclusion common in U.S. studies (typically $1 or $5 thresholds) and helps mitigate the influence of extreme percentage returns and spreads at very low price levels.

  4. Minimum trading activity. For volume-based measures, we require that a stock has non-zero trading volume on at least 80% of trading days within each estimation window. This filter eliminates the most thinly traded securities for which turnover-based measures would be unreliable.

def load_daily_data(config):
    """
    Load daily stock trading data from DataCore.vn.
    
    In practice, this function connects to the DataCore API or reads
    from a local database/CSV. Here we document the expected schema.
    
    Expected columns:
    - ticker: str, stock ticker symbol (e.g., 'VCB', 'HPG', 'VNM')
    - date: datetime, trading date
    - open, high, low, close: float, daily OHLC prices (VND)
    - volume: int, trading volume (shares)
    - shares_outstanding: int, total shares outstanding
    - adjusted_close: float, price adjusted for corporate actions
    - adj_factor: float, cumulative adjustment factor
    - bid, ask: float, best bid/ask at close
    - exchange: str, exchange code ('HOSE', 'HNX', 'UPCOM')
    - industry_icb: str, ICB industry classification code
    - foreign_buy_vol, foreign_sell_vol: int, foreign investor volumes
    - foreign_ownership_pct: float, foreign ownership percentage
    """
    # =========================================================================
    # Replace with actual DataCore API call:
    # from datacore import Client
    # client = Client(api_key='YOUR_KEY')
    # df = client.daily_stock(
    #     start=config['beg_date'], end=config['end_date'],
    #     exchanges=config['exchanges']
    # )
    # =========================================================================
    print("Connect to DataCore.vn and load daily stock data.")
    print("Expected schema: ticker, date, open, high, low, close, volume,")
    print("  shares_outstanding, adjusted_close, adj_factor, bid, ask,")
    print("  exchange, industry_icb, foreign_buy_vol, foreign_sell_vol,")
    print("  foreign_ownership_pct")
    return None  # Replace with actual data


def apply_sample_filters(df, config):
    """Apply sequential sample construction filters."""
    print("\n=== Sample Construction ===")
    n0 = len(df)
    
    # Date filter
    df = df[(df['date'] >= config['beg_date']) &
            (df['date'] <= config['end_date'])].copy()
    print(f"[1] Date filter: {len(df):,} obs (from {n0:,})")
    
    # Exchange filter
    df = df[df['exchange'].isin(config['exchanges'])].copy()
    print(f"[2] Exchange filter ({config['exchanges']}): {len(df):,} obs")
    
    # Price filter
    df = df[df['close'] >= config['min_price']].copy()
    print(f"[3] Price >= {config['min_price']:,} VND: {len(df):,} obs")
    
    # Compute daily return from adjusted prices
    df = df.sort_values(['ticker', 'date'])
    df['ret'] = df.groupby('ticker')['adjusted_close'].pct_change()
    
    # Flag price limit hits
    df['limit_hit'] = (
        ((df['exchange'] == 'HOSE') &
         (df['ret'].abs() >= config['price_limit_hose'] - 0.001)) |
        ((df['exchange'] == 'HNX') &
         (df['ret'].abs() >= config['price_limit_hnx'] - 0.001))
    )
    
    n_tickers = df['ticker'].nunique()
    print(f"\nFinal sample: {len(df):,} stock-day obs, "
          f"{n_tickers} unique tickers")
    print(f"Limit-hit days: {df['limit_hit'].sum():,} "
          f"({100*df['limit_hit'].mean():.2f}%)")
    return df

34.3 Corporate Action Adjustments

Proper adjustment for corporate actions is critical for volume-based DIVOP measures, as events such as stock splits, bonus share issues, and rights offerings change the number of shares outstanding and can create artificial spikes in measured turnover. We need to use cumulative adjustment factors that account for stock dividends (bonus shares), stock splits, rights offerings, and cash dividends (price adjustment only). We use these to construct adjusted volume and adjusted shares outstanding:

\[ \text{AdjVolume}_{i,t} = \text{Volume}_{i,t} \times \text{CumAdjFactor}_{i,t} \]

\[ \text{AdjSharesOut}_{i,t} = \text{SharesOut}_{i,t} \times \text{CumAdjFactor}_{i,t} \]

This ensures that the turnover ratio is consistent across corporate action events.

def adjust_for_corporate_actions(df):
    """Apply cumulative adjustment factors to volume and shares outstanding."""
    df = df.copy()
    df['adj_volume'] = df['volume'] * df['adj_factor']
    df['adj_shares_out'] = df['shares_outstanding'] * df['adj_factor']
    
    # Daily turnover ratio
    df['turnover'] = np.where(
        df['adj_shares_out'] > 0,
        df['adj_volume'] / df['adj_shares_out'],
        np.nan
    )
    
    # Flag extreme turnover (> 50% of float)
    extreme = df['turnover'] > 0.50
    if extreme.any():
        print(f"Warning: {extreme.sum()} obs with turnover > 50%, set to NaN")
        df.loc[extreme, 'turnover'] = np.nan
    
    return df

34.4 Trading Calendar Construction

The rolling regression approach for SUV and volatility requires a trading calendar that ensures each estimation window contains exactly the specified number of trading days. We construct this directly from observed trading dates.

def build_trading_calendar(df, config):
    """
    Map each trading date to its estimation window [est_start, est_end].
    
    For date t, the estimation window runs from
    t - gap - est_window to t - gap - 1 (in trading-day terms).
    """
    trading_dates = sorted(df['date'].unique())
    trading_dates = pd.Series(trading_dates)
    
    est_window = config['est_window']
    gap = config['gap']
    offset = est_window + gap
    
    records = []
    for i in range(offset, len(trading_dates)):
        records.append({
            'date': trading_dates.iloc[i],
            'est_start': trading_dates.iloc[i - gap - est_window],
            'est_end': trading_dates.iloc[i - gap - 1]
        })
    
    calendar = pd.DataFrame(records)
    print(f"Trading calendar: {len(calendar)} dates, "
          f"{calendar['date'].min()} to {calendar['date'].max()}")
    return calendar

35 Volume-Based DIVOP Proxies

35.1 Theoretical Motivation

Trading volume has long been recognized as a natural proxy for divergence of investor opinion. In the rational expectations framework of Milgrom and Stokey (1982), trade occurs only when investors disagree about the value of a security (i.e., a “no-trade theorem” that implies, by contrapositive, that observed trading volume must reflect some form of heterogeneous beliefs). Harris and Raviv (1993) and Kandel and Pearson (1995) formalize this intuition, showing that trading volume is positively related to the dispersion of investors’ prior beliefs and to the degree to which public information is differentially interpreted.

The challenge in using raw trading volume as a DIVOP proxy is that volume is also driven by factors unrelated to disagreement, including portfolio rebalancing, liquidity needs, tax-loss selling, and index reconstitution effects. Garfinkel (2009) proposes two approaches to extract the disagreement component from raw volume. The first, Unexplained Volume (DTO), removes market-wide volume effects and secular trends. The second, Standardized Unexplained Volume (SUV), additionally controls for the information content of returns through a cross-sectional regression, isolating the “pure disagreement” component of trading activity.

35.2 Unexplained Volume (DTO)

35.2.1 Construction Methodology

The construction of the Unexplained Volume measure proceeds in four steps.

Step 1: Compute firm-level daily turnover. For each stock \(i\) on day \(t\):

\[ \text{Turn}_{i,t} = \frac{\text{AdjVolume}_{i,t}}{\text{AdjSharesOut}_{i,t}} \]

Step 2: Compute market-wide turnover. We calculate aggregate turnover across all common stocks as a value-weighted average:

\[ \text{MktTurn}_{t} = \frac{\sum_{i} \text{AdjVolume}_{i,t}}{\sum_{i} \text{AdjSharesOut}_{i,t}} \]

Unlike the U.S. methodology that computes market turnover across NYSE/AMEX stocks only and applies a scaling adjustment for NASDAQ securities (following A.-M. Anderson and Dyl 2005), we compute market turnover across all HOSE and HNX common stocks without any exchange-specific volume scaling. Both Vietnamese exchanges operate as order-driven markets (HOSE uses continuous order matching; HNX uses a combination of continuous matching and periodic call auctions) without the dealer-market double-counting issue that necessitates the NASDAQ volume adjustment in U.S. studies.

Step 3: Compute market-adjusted turnover.

\[ \text{MATO}_{i,t} = \text{Turn}_{i,t} - \text{MktTurn}_{t} \]

Step 4: Detrend by rolling median. To remove secular trends in firm-specific trading activity:

\[ \text{DTO}_{i,t} = \text{MATO}_{i,t} - \text{Median}_{180}(\text{MATO}_{i,t-7}) \]

where \(\text{Median}_{180}(\text{MATO}_{i,t-7})\) is the median of market-adjusted turnover over the 180-trading-day window ending 7 days before date \(t\). The 7-day lag prevents the current day’s turnover from influencing its own detrending baseline.

def compute_market_turnover(df):
    """Compute daily market-wide turnover across all stocks."""
    mkt_turn = df.groupby('date').apply(
        lambda x: x['adj_volume'].sum() / x['adj_shares_out'].sum()
        if x['adj_shares_out'].sum() > 0 else np.nan
    ).reset_index()
    mkt_turn.columns = ['date', 'market_turnover']
    return mkt_turn


def compute_dto(df, config):
    """
    Construct Unexplained Volume (DTO).
    
    Steps:
    1. Subtract market turnover -> MATO
    2. Rolling 180-day median of MATO (lagged 7 days) -> trend
    3. DTO = MATO - trend
    """
    detrend_window = config['detrend_window']
    lag = config['lag']
    
    # Market turnover
    mkt_turn = compute_market_turnover(df)
    df = df.merge(mkt_turn, on='date', how='left')
    
    # Market-adjusted turnover
    df['mato'] = df['turnover'] - df['market_turnover']
    
    # Rolling median with lag, computed per stock
    df = df.sort_values(['ticker', 'date'])
    
    def _rolling_median_lagged(group):
        mato = group['mato']
        med = mato.rolling(
            window=detrend_window,
            min_periods=int(detrend_window * 0.5)
        ).median()
        return med.shift(lag)
    
    df['mato_trend'] = (
        df.groupby('ticker', group_keys=False)
          .apply(lambda g: _rolling_median_lagged(g))
    )
    
    # DTO
    df['dto'] = df['mato'] - df['mato_trend']
    
    print("DTO construction complete.")
    print(f"  Non-missing: {df['dto'].notna().sum():,}")
    print(f"  Mean: {df['dto'].mean():.6f}, Std: {df['dto'].std():.6f}")
    return df

35.2.2 Vietnam-Specific Considerations for DTO

Several features of the Vietnamese market require attention when constructing DTO:

  1. No NASDAQ-type volume adjustment needed. Both HOSE and HNX are order-driven auction markets. The double-counting adjustment applied to NASDAQ securities in the U.S. literature is not necessary.

  2. Thinly traded stocks. A substantial fraction of listed Vietnamese stocks, particularly on HNX, may have zero volume on many trading days. For stocks with intermittent trading, the rolling median may be biased toward zero, making DTO less informative. We require at least 80% non-zero volume days in each estimation window.

  3. Price limit effects on volume. When a stock hits its daily price limit, unfilled orders accumulate and recorded volume may understate true clearing volume. The following day often shows a “catch-up” effect. Researchers should consider flagging limit-hit days.

  4. Foreign investor trading decomposition. DataCore provides volume by investor type (foreign versus domestic). Researchers may wish to construct separate DTO measures for foreign and domestic volume, or use the foreign-to-domestic volume ratio as an additional dimension of disagreement.

35.3 Standardized Unexplained Volume (SUV)

35.3.1 Construction Methodology

The Standardized Unexplained Volume measure, proposed by Garfinkel (2009), isolates the disagreement component of volume by explicitly controlling for the information content of returns. The insight is that trading volume has both a liquidity component and an informedness component correlated with the magnitude and sign of returns. By regressing turnover on signed returns and extracting the standardized residual, SUV captures volume attributable to disagreement after controlling for both liquidity trends and information-driven trading.

For each stock \(i\), on each trading date \(t\), we estimate using data from the estimation window \([\tau_1, \tau_2]\):

\[ \text{Turn}_{i,s} = \alpha_i + \beta_i^{+} \cdot \text{RetPos}_{i,s} + \beta_i^{-} \cdot \text{RetNeg}_{i,s} + \epsilon_{i,s}, \quad s \in [\tau_1, \tau_2] \tag{35.1}\]

where \(\text{RetPos}_{i,s} = |r_{i,s}| \cdot \mathbf{1}(r_{i,s} > 0)\) and \(\text{RetNeg}_{i,s} = |r_{i,s}| \cdot \mathbf{1}(r_{i,s} < 0)\).

The Standardized Unexplained Volume on date \(t\) is:

\[ \text{SUV}_{i,t} = \frac{\text{Turn}_{i,t} - \hat{\text{Turn}}_{i,t}}{\hat{\sigma}_{\epsilon,i}} \tag{35.2}\]

where \(\hat{\text{Turn}}_{i,t}\) is the predicted turnover and \(\hat{\sigma}_{\epsilon,i}\) is the RMSE from Equation 35.1.

The asymmetric specification with separate coefficients for positive and negative returns reflects that the volume-return relation differs by return sign. In the U.S., buying pressure tends to generate more volume than selling pressure due to short-sale frictions. In Vietnam, where short selling was unavailable until 2025, this asymmetry should be even more pronounced because all selling activity was constrained to existing shareholders.

def compute_suv(df, calendar, config):
    """
    Compute Standardized Unexplained Volume via rolling regressions.
    
    For each stock-date, regress Turn on RetPos and RetNeg over the
    estimation window, then compute SUV = (actual - predicted) / RMSE.
    """
    est_window = config['est_window']
    min_obs = int(est_window * config['min_volume_days'])
    
    # Prepare signed return components
    df = df.copy()
    df['ret_pos'] = np.where(df['ret'] > 0, np.abs(df['ret']), 0.0)
    df['ret_neg'] = np.where(
        (df['ret'] < 0) & df['ret'].notna(), np.abs(df['ret']), 0.0
    )
    
    results = []
    grouped = {t: g for t, g in df.groupby('ticker')}
    
    for _, cal_row in calendar.iterrows():
        dt = cal_row['date']
        est_s, est_e = cal_row['est_start'], cal_row['est_end']
        
        for ticker, tdata in grouped.items():
            # Estimation window
            est = tdata[
                (tdata['date'] >= est_s) & (tdata['date'] <= est_e)
            ].dropna(subset=['turnover', 'ret_pos', 'ret_neg'])
            
            if len(est) < min_obs:
                continue
            
            # Event date
            evt = tdata[tdata['date'] == dt]
            if evt.empty or evt['turnover'].isna().all():
                continue
            
            # OLS: Turn = alpha + beta_pos * RetPos + beta_neg * RetNeg
            X = est[['ret_pos', 'ret_neg']].values
            y = est['turnover'].values
            
            reg = LinearRegression().fit(X, y)
            y_hat = reg.predict(X)
            rmse = np.sqrt(np.mean((y - y_hat) ** 2))
            
            if rmse <= 0:
                continue
            
            # Predict and standardize for event date
            X_evt = evt[['ret_pos', 'ret_neg']].values
            pred = reg.predict(X_evt)[0]
            actual = evt['turnover'].values[0]
            suv = (actual - pred) / rmse
            
            results.append({
                'ticker': ticker, 'date': dt,
                'suv': suv,
                'predicted_turnover': pred,
                'rmse_turn': rmse,
                'n_est': len(est),
                'alpha_turn': reg.intercept_,
                'beta_pos': reg.coef_[0],
                'beta_neg': reg.coef_[1],
            })
    
    suv_df = pd.DataFrame(results)
    print(f"SUV: {len(suv_df):,} stock-date obs")
    print(f"  Mean: {suv_df['suv'].mean():.4f}, "
          f"Median: {suv_df['suv'].median():.4f}")
    return suv_df

35.3.2 Interpreting the SUV Regression Coefficients

The estimated coefficients from Equation 35.1 are informative about market microstructure. Garfinkel (2009) reports \(\hat{\beta}^{+} > \hat{\beta}^{-}\) for most U.S. stocks. In Vietnam, we expect this asymmetry to be even stronger because:

  • No short selling (pre-2025): All selling is by existing shareholders, limiting volume response to negative returns.
  • T+2 settlement: Investors cannot immediately reinvest sale proceeds, further dampening sell-side volume.
  • Price limits: The \(\pm\) 7% (HOSE) and \(\pm\) 10% (HNX) daily limits truncate the return distribution, compressing the range of both regressors.

Researchers should report summary statistics of \((\hat{\alpha}, \hat{\beta}^{+}, \hat{\beta}^{-}, R^2)\) across the cross-section and over time.

def suv_diagnostics(suv_df):
    """Report cross-sectional summary of SUV regression parameters."""
    print("\n=== SUV Regression Diagnostics ===")
    
    params = ['alpha_turn', 'beta_pos', 'beta_neg']
    print(suv_df[params].describe(
        percentiles=[.05, .25, .50, .75, .95]
    ).T.to_string(float_format='{:.6f}'.format))
    
    # Asymmetry test
    diff = suv_df['beta_pos'] - suv_df['beta_neg']
    print(f"\nbeta_pos - beta_neg: mean = {diff.mean():.6f}, "
          f"frac > 0 = {(diff > 0).mean():.3f}")

36 Volatility-Based DIVOP Proxies

36.1 Total Return Volatility

36.1.1 Theoretical Motivation

Stock return volatility serves as a proxy for divergence of opinion through several channels. Shalen (1993) develops a model in which both volume and volatility are increasing in the dispersion of investor beliefs. Scheinkman and Xiong (2003) predict that higher volatility reflects the speculative trading component driven by overconfident investors who disagree about value. Empirically, Boehme, Danielsen, and Sorescu (2006) and Chatterjee, John, and Yan (2012) use idiosyncratic volatility as a DIVOP proxy and find it positively correlated with other disagreement measures and negatively associated with subsequent returns when short-sale constraints bind.

36.1.2 Construction

Total return volatility is the standard deviation of daily returns over the rolling estimation window:

\[ \text{VOLATILITY}_{i,t} = \sqrt{\frac{1}{N_i - 1} \sum_{s \in [\tau_1, \tau_2]} (r_{i,s} - \bar{r}_i)^2} \tag{36.1}\]

where \(N_i\) is the number of non-missing return observations for stock \(i\) in the window \([\tau_1, \tau_2]\).

36.2 Idiosyncratic Volatility (IVOL)

Idiosyncratic volatility isolates firm-specific return variation by removing the systematic component explained by market movements. We compute IVOL from the residuals of a market model:

\[ r_{i,s} = \alpha_i + \beta_i \cdot r_{m,s} + \epsilon_{i,s}, \quad s \in [\tau_1, \tau_2] \tag{36.2}\]

\[ \text{IVOL}_{i,t} = \text{Std}(\hat{\epsilon}_{i,s}) \tag{36.3}\]

Researchers may extend this to a Fama and French (1993) three-factor or five-factor model using Vietnamese factor portfolios constructed elsewhere in this book. A richer factor model yields IVOL estimates that better isolate truly idiosyncratic disagreement, at the cost of requiring factor portfolio construction.

def compute_volatility(df, calendar, config):
    """
    Compute total return volatility and idiosyncratic volatility
    via rolling estimation windows.
    
    Total vol = std(returns) in window.
    IVOL = std(residuals) from market model regression.
    """
    est_window = config['est_window']
    min_obs = int(est_window * config['min_volume_days'])
    
    # Value-weighted market return
    def _vw_ret(g):
        valid = g.dropna(subset=['ret'])
        if valid.empty:
            return np.nan
        w = valid['adj_shares_out'] * valid['close']
        return np.average(valid['ret'], weights=w)
    
    mkt_ret = df.groupby('date').apply(_vw_ret).reset_index()
    mkt_ret.columns = ['date', 'mkt_ret']
    df = df.merge(mkt_ret, on='date', how='left')
    
    results = []
    grouped = {t: g for t, g in df.groupby('ticker')}
    
    for _, cal_row in calendar.iterrows():
        dt = cal_row['date']
        est_s, est_e = cal_row['est_start'], cal_row['est_end']
        
        for ticker, tdata in grouped.items():
            est = tdata[
                (tdata['date'] >= est_s) & (tdata['date'] <= est_e)
            ].dropna(subset=['ret', 'mkt_ret'])
            
            if len(est) < min_obs:
                continue
            
            # Total volatility
            total_vol = est['ret'].std()
            
            # Market model -> IVOL
            X = est[['mkt_ret']].values
            y = est['ret'].values
            reg = LinearRegression().fit(X, y)
            resid = y - reg.predict(X)
            ivol = np.std(resid, ddof=1)
            
            results.append({
                'ticker': ticker, 'date': dt,
                'total_volatility': total_vol,
                'idio_volatility': ivol,
                'market_beta': reg.coef_[0],
                'market_alpha': reg.intercept_,
                'r_squared_mm': reg.score(X, y),
                'n_vol': len(est),
            })
    
    vol_df = pd.DataFrame(results)
    print(f"Volatility: {len(vol_df):,} stock-date obs")
    print(f"  Total vol (ann. mean): "
          f"{vol_df['total_volatility'].mean() * np.sqrt(252):.4f}")
    print(f"  IVOL (ann. mean): "
          f"{vol_df['idio_volatility'].mean() * np.sqrt(252):.4f}")
    return vol_df

36.2.1 Vietnam-Specific Considerations for Volatility

  1. Price limits compress measured volatility. Daily limits of \(\pm\) 7% (HOSE) and \(\pm\) 10% (HNX) mechanically truncate the return distribution, leading to underestimation of true volatility. On limit-hit days, the true equilibrium return may exceed the observed return. Researchers should be aware that volatility-based DIVOP measures may be downward-biased for stocks that frequently hit limits.

  2. VN-Index concentration. The VN-Index is highly concentrated, the top 10 stocks often account for 50-60% of index weight. For small- and mid-cap stocks, an equal-weighted market return or a composite HOSE+HNX index may provide a better market factor in Equation 36.2.

  3. Thin trading and non-synchronous returns. For thinly traded stocks, consecutive zero-return days can depress measured volatility. The Dimson (1979) adjustment (including lagged and lead market returns in the market model) may help correct for non-synchronous trading bias in the beta estimate, though its effect on IVOL is typically small.

37 Spread-Based and Liquidity DIVOP Proxies

37.1 Bid-Ask Spread (BASPREAD)

37.1.1 Theoretical Motivation

The bid-ask spread reflects the adverse selection costs faced by limit order providers. When investors hold heterogeneous beliefs, each trade is more likely to convey private information, raising the adverse selection component of the spread. Handa, Schwartz, and Tiwari (2003) show that in order-driven markets the spread widens when divergence of opinion increases because limit order providers face greater risk of being picked off by informed traders. Chung and Zhang (2014) demonstrate that closing bid-ask spreads from daily data provide a reliable approximation to intraday effective spreads.

37.1.2 Construction

We compute the proportional bid-ask spread using end-of-day quote data:

\[ \text{BASPREAD}_{i,t} = \frac{\text{Ask}_{i,t} - \text{Bid}_{i,t}}{\text{Midpoint}_{i,t}} \tag{37.1}\]

where \(\text{Midpoint}_{i,t} = (\text{Ask}_{i,t} + \text{Bid}_{i,t}) / 2\). When end-of-day bid and ask are unavailable, we use the daily high-low range as a fallback. Following Chung and Zhang (2014), we delete observations where both Bid and Ask are zero, and where the spread exceeds 50% of the midpoint.

37.2 Amihud Illiquidity (ILLIQ)

The Amihud (2002) ratio measures the price impact of order flow:

\[ \text{ILLIQ}_{i,t} = \frac{|r_{i,t}|}{\text{DolVol}_{i,t}} \tag{37.2}\]

where \(\text{DolVol}_{i,t} = \text{Volume}_{i,t} \times \text{Price}_{i,t}\) (in billions VND for scaling). Higher ILLIQ reflects greater information asymmetry. We average daily ratios over monthly horizons and use the log transformation due to heavy right skew.

def compute_spread_and_illiq(df, config):
    """Compute bid-ask spread (BASPREAD) and Amihud illiquidity."""
    df = df.copy()
    
    # --- Bid-Ask Spread ---
    df['midpoint_ba'] = (df['ask'] + df['bid']) / 2
    df['baspread_ba'] = np.where(
        (df['ask'] > 0) & (df['bid'] > 0) & (df['midpoint_ba'] > 0),
        (df['ask'] - df['bid']) / df['midpoint_ba'], np.nan
    )
    
    # Fallback: high/low range
    df['midpoint_hl'] = (df['high'] + df['low']) / 2
    df['baspread_hl'] = np.where(
        (df['high'] > 0) & (df['low'] > 0) & (df['midpoint_hl'] > 0),
        (df['high'] - df['low']) / df['midpoint_hl'], np.nan
    )
    
    df['baspread'] = df['baspread_ba'].fillna(df['baspread_hl'])
    df['midpoint'] = df['midpoint_ba'].fillna(df['midpoint_hl'])
    
    # Chung & Zhang (2009) filters
    bad = (df['baspread'].isna()) | \
          (df['baspread'] > config['max_spread_pct']) | \
          (df['baspread'] < 0)
    df.loc[bad, 'baspread'] = np.nan
    
    # --- Amihud Illiquidity ---
    df['dollar_vol'] = df['volume'] * df['close'] / 1e9
    df['amihud_daily'] = np.where(
        df['dollar_vol'] > 0,
        np.abs(df['ret']) / df['dollar_vol'], np.nan
    )
    
    print(f"BASPREAD: {df['baspread'].notna().sum():,} valid obs, "
          f"mean = {df['baspread'].mean():.6f}")
    print(f"AMIHUD: {df['amihud_daily'].notna().sum():,} valid obs, "
          f"mean = {df['amihud_daily'].mean():.6f}")
    return df


def compute_amihud_monthly(df):
    """Monthly Amihud = mean daily |ret|/dollar_vol (min 15 days)."""
    df = df.copy()
    df['ym'] = df['date'].dt.to_period('M')
    agg = df.groupby(['ticker', 'ym']).agg(
        illiq_mean=('amihud_daily', 'mean'),
        n_days=('amihud_daily', 'count'),
    ).reset_index()
    agg = agg[agg['n_days'] >= 15].copy()
    agg['log_illiq'] = np.log(agg['illiq_mean'] + 1e-10)
    return agg

37.2.1 Vietnam-Specific Considerations for Spread and Liquidity

  1. Tick size schedule. Vietnam uses variable tick sizes: 10 VND (prices < 10,000), 50 VND (10,000–49,950), and 100 VND (≥ 50,000) on HOSE. These impose a floor on quoted spreads for low-priced stocks. Researchers should be cautious interpreting cross-price-decile spread variation as reflecting opinion divergence rather than tick-size mechanics.

  2. Order-driven market structure. Both HOSE and HNX are pure order-driven markets where public limit orders provide liquidity. This makes the Chung and Zhang (2014) CRSP-based spread approximation appropriate.

  3. Lot size requirements. HOSE requires 100-share standard lots for continuous trading. For high-priced stocks, the standard lot represents a large capital commitment, potentially inflating quoted spreads relative to effective trading costs.

  4. Call auction effects. Opening and closing sessions on HOSE use periodic call auctions, which can produce bid-ask quotes that differ substantially from continuous-trading spreads.

38 Analyst Forecast Dispersion

38.1 Theoretical Motivation

Analyst forecast dispersion, the cross-sectional standard deviation of individual analysts’ earnings forecasts, is the most direct measure of divergence of opinion. Unlike market-based proxies that capture disagreement indirectly, forecast dispersion directly measures disagreement among informed market participants. Abarbanell, Lanen, and Verrecchia (1995) establish the theoretical basis, and Diether, Malloy, and Scherbina (2002) demonstrate that stocks with higher analyst forecast dispersion earn lower subsequent returns, consistent with the Miller overpricing hypothesis.

38.2 Data Challenges in Vietnam

Constructing analyst forecast dispersion in Vietnam presents substantial challenges relative to the U.S.:

  • Coverage breadth. While I/B/E/S covers over 4,000 U.S. companies, only 100–150 Vietnamese firms typically have coverage by at least 3 analysts, concentrated among VN30 constituents.
  • Data sources. Analyst forecasts are available from DataCore.vn, FiinPro, Bloomberg, and Refinitiv. The choice of source affects coverage and timeliness.
  • Forecast staleness. With limited coverage, forecasts may go unrevised for months. Following I/B/E/S methodology, we carry each forecast forward for a maximum of 105 days.

38.3 Construction Methodology

The construction proceeds as follows:

  1. Clean individual forecasts. Remove observations where the announcement date precedes the review date. Keep only annual EPS forecasts. For each analyst-ticker-fiscal period, retain only the latest forecast per calendar month.
  2. Handle stopped and excluded estimates. Remove forecasts where the analyst has left the brokerage or the estimate has been excluded from consensus.
  3. Carry forward with staleness control. Each forecast is valid until the earlier of: (a) the next forecast by the same analyst, (b) 105 days after the announcement, or (c) the actual earnings announcement date.
  4. Expand to monthly frequency. For each ticker-month, identify all valid outstanding forecasts and compute dispersion.
  5. Compute scaled measures:

\[ \text{DISP1}_{i,m} = \frac{\text{Std}(\hat{\text{EPS}}_{i,m}^{(a)})}{|\text{Mean}(\hat{\text{EPS}}_{i,m}^{(a)})|} \qquad \text{DISP2}_{i,m} = \frac{\text{Std}(\hat{\text{EPS}}_{i,m}^{(a)})}{\bar{P}_{i,m}} \]

def construct_analyst_dispersion(forecasts_df, price_df, config):
    """
    Construct analyst forecast dispersion measures.
    
    Parameters
    ----------
    forecasts_df : pd.DataFrame
        Individual analyst forecasts with: ticker, analyst_id, broker,
        fpedats, anndats, revdats, value (EPS), anndats_act.
    price_df : pd.DataFrame
        Monthly price: ticker, month, mean_price.
    config : dict
        With min_analysts, forecast_carry_days.
    """
    carry_days = config['forecast_carry_days']
    min_analysts = config['min_analysts']
    
    df = forecasts_df.copy()
    df = df[df['anndats'] <= df['revdats']].copy()
    df = df.dropna(subset=['fpedats', 'anndats', 'value'])
    
    # Latest forecast per analyst-month
    df['ym'] = df['anndats'].dt.to_period('M')
    df = df.sort_values(
        ['ticker', 'fpedats', 'analyst_id', 'ym', 'anndats', 'revdats']
    )
    df = df.groupby(['ticker', 'fpedats', 'analyst_id', 'ym']).tail(1)
    
    # Carry-forward end date
    df = df.sort_values(
        ['ticker', 'analyst_id', 'fpedats', 'anndats'],
        ascending=[True, True, True, False]
    )
    df['next_ann'] = df.groupby(
        ['ticker', 'analyst_id', 'fpedats']
    )['anndats'].shift(-1)
    
    def _carry_end(row):
        candidates = [row['anndats'] + pd.Timedelta(days=carry_days)]
        if pd.notna(row.get('next_ann')):
            candidates.append(row['next_ann'])
        if pd.notna(row.get('anndats_act')):
            candidates.append(row['anndats_act'])
        return min(candidates)
    
    df['carry_end'] = df.apply(_carry_end, axis=1)
    
    # Monthly expansion
    months = pd.period_range(config['beg_date'], config['end_date'], freq='M')
    records = []
    for month in months:
        me = month.to_timestamp(how='end')
        valid = df[(df['anndats'] <= me) & (df['carry_end'] > me)].copy()
        valid = valid[valid['fpedats'] > me]
        valid = valid.sort_values(['ticker', 'analyst_id', 'anndats'])
        valid = valid.groupby(['ticker', 'analyst_id']).tail(1)
        
        disp = valid.groupby('ticker').agg(
            n_analysts=('analyst_id', 'nunique'),
            mean_fcst=('value', 'mean'),
            std_fcst=('value', 'std'),
        ).reset_index()
        disp['month'] = month
        records.append(disp)
    
    if not records:
        return pd.DataFrame()
    disp_df = pd.concat(records, ignore_index=True)
    
    # Scaled measures
    disp_df['disp1'] = np.where(
        disp_df['mean_fcst'].abs() > 0,
        disp_df['std_fcst'] / disp_df['mean_fcst'].abs(), np.nan
    )
    disp_df = disp_df.merge(price_df, on=['ticker', 'month'], how='left')
    disp_df['disp2'] = np.where(
        disp_df['mean_price'] > 0,
        disp_df['std_fcst'] / disp_df['mean_price'], np.nan
    )
    disp_df['disp_raw'] = disp_df['std_fcst']
    
    out = disp_df[disp_df['n_analysts'] >= min_analysts].copy()
    print(f"DISP: {len(out):,} ticker-months (>= {min_analysts} analysts)")
    print(f"  Mean analysts: {out['n_analysts'].mean():.1f}")
    return out

38.4 Scaling Considerations

Following Cheong and Thomas (2011), we note that each scaling choice has pitfalls. DISP1 (scaled by absolute mean forecast) can produce extreme values when the mean forecast approaches zero—common for Vietnamese firms near breakeven. DISP2 (scaled by price) introduces a mechanical negative correlation between price and scaled dispersion. We recommend reporting all three versions (DISP1, DISP2, and unscaled DISP_RAW with \(\ln(\text{Price})\) as an additional control), and winsorizing DISP1 at the 1st and 99th percentiles.

WarningCaution on Analyst Dispersion in Thin-Coverage Markets

With typical coverage of 5–10 analysts per firm in Vietnam (versus 15–25 in the U.S.), forecast dispersion is estimated with substantially greater noise. A dispersion measure from 3 analysts has a very different sampling distribution than one from 20. Always include the number of analysts as a control and test robustness with varying minimum-analyst thresholds (3, 5, 7).

39 Cross-Sectional Correlations Among DIVOP Proxies

An important empirical question is the degree to which the various DIVOP proxies capture the same underlying construct. If divergence of opinion is a well-defined latent variable, we expect positive correlations among all proxies, though correlations need not be high since each captures a different facet of disagreement.

def compute_divop_correlations(merged_df, proxies=None):
    """
    Compute and visualize Spearman correlations among DIVOP proxies.
    We use rank correlations because many proxies are right-skewed.
    """
    if proxies is None:
        proxies = [
            'dto', 'suv', 'total_volatility', 'idio_volatility',
            'baspread', 'amihud_daily', 'disp1', 'disp2'
        ]
    available = [p for p in proxies if p in merged_df.columns]
    data = merged_df[available].dropna()
    
    n = len(available)
    rho_mat = np.eye(n)
    p_mat = np.zeros((n, n))
    for i in range(n):
        for j in range(i + 1, n):
            rho, p = scipy_stats.spearmanr(
                data[available[i]], data[available[j]]
            )
            rho_mat[i, j] = rho_mat[j, i] = rho
            p_mat[i, j] = p_mat[j, i] = p
    
    labels = {'dto': 'DTO', 'suv': 'SUV',
              'total_volatility': 'VOL', 'idio_volatility': 'IVOL',
              'baspread': 'SPREAD', 'amihud_daily': 'ILLIQ',
              'disp1': 'DISP1', 'disp2': 'DISP2'}
    pretty = [labels.get(c, c) for c in available]
    corr_df = pd.DataFrame(rho_mat, index=pretty, columns=pretty)
    
    # Heatmap
    fig, ax = plt.subplots(figsize=(9, 7))
    mask = np.triu(np.ones_like(corr_df, dtype=bool), k=1)
    sns.heatmap(
        corr_df, mask=mask, annot=True, fmt='.3f',
        cmap='RdBu_r', center=0, vmin=-0.4, vmax=0.7,
        square=True, linewidths=0.5,
        cbar_kws={'shrink': 0.8, 'label': 'Spearman ρ'}, ax=ax
    )
    ax.set_title('Spearman Correlations Among DIVOP Proxies\n'
                  'Vietnamese Equity Market', fontsize=13, fontweight='bold')
    plt.tight_layout()
    plt.savefig('divop_correlations.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    return corr_df

39.0.1 Expected Correlation Patterns

Based on U.S. evidence and theory, we expect:

Table 39.1: Expected correlation structure among DIVOP proxies
Pair Expected Rationale
DTO × SUV High positive Both capture abnormal volume; SUV refines DTO
VOL × IVOL High positive IVOL is a subset of total volatility
SPREAD × ILLIQ Moderate-high positive Both capture information asymmetry
Volume × Volatility Moderate positive Shalen (1993) links both to belief dispersion
Analyst × Market-based Weak-moderate positive Different investor populations

40 Descriptive Statistics and Cross-Sectional Properties

40.1 Summary Statistics

def descriptive_statistics(merged_df):
    """Comprehensive descriptive statistics for DIVOP proxies."""
    proxies = {
        'dto': 'Unexplained Volume (DTO)',
        'suv': 'Std Unexplained Volume (SUV)',
        'total_volatility': 'Total Return Volatility',
        'idio_volatility': 'Idiosyncratic Volatility',
        'baspread': 'Bid-Ask Spread',
        'amihud_daily': 'Amihud Illiquidity',
        'disp1': 'Analyst Disp (mean-scaled)',
        'disp2': 'Analyst Disp (price-scaled)',
    }
    avail = {k: v for k, v in proxies.items() if k in merged_df.columns}
    rows = []
    for col, label in avail.items():
        s = merged_df[col].dropna()
        rows.append({
            'Proxy': label, 'N': f'{len(s):,}',
            'Mean': f'{s.mean():.6f}', 'Std': f'{s.std():.6f}',
            'P5': f'{s.quantile(.05):.6f}',
            'Median': f'{s.median():.6f}',
            'P95': f'{s.quantile(.95):.6f}',
            'Skew': f'{s.skew():.2f}',
            'Kurt': f'{s.kurtosis():.2f}',
        })
    stats = pd.DataFrame(rows).set_index('Proxy')
    print("\n" + "=" * 90)
    print("Descriptive Statistics of DIVOP Proxies")
    print("Vietnamese Equity Market, HOSE and HNX")
    print("=" * 90)
    print(stats.to_string())
    return stats

40.2 DIVOP by Firm Characteristics

def divop_by_size(merged_df):
    """Mean DIVOP proxies by market-cap quintile."""
    df = merged_df.copy()
    df['mkt_cap'] = df['close'] * df['shares_outstanding']
    df['size_q'] = df.groupby('date')['mkt_cap'].transform(
        lambda x: pd.qcut(x, 5,
            labels=['Q1 Small','Q2','Q3','Q4','Q5 Large'],
            duplicates='drop')
    )
    proxies = ['dto','suv','total_volatility','idio_volatility',
               'baspread','amihud_daily']
    avail = [p for p in proxies if p in df.columns]
    tab = df.groupby('size_q')[avail].mean()
    print("\n=== Mean DIVOP by Size Quintile ===")
    print(tab.to_string(float_format='{:.6f}'.format))
    return tab

def divop_by_exchange(merged_df):
    """Compare mean DIVOP across HOSE and HNX."""
    proxies = ['dto','suv','total_volatility','idio_volatility',
               'baspread','amihud_daily']
    avail = [p for p in proxies if p in merged_df.columns]
    tab = merged_df.groupby('exchange')[avail].mean()
    print("\n=== Mean DIVOP by Exchange ===")
    print(tab.to_string(float_format='{:.6f}'.format))
    return tab

40.3 Time-Series Evolution

def plot_divop_timeseries(merged_df):
    """Plot monthly cross-sectional median DIVOP with crisis shading."""
    df = merged_df.copy()
    df['ym'] = df['date'].dt.to_period('M')
    proxies = ['dto','suv','total_volatility','baspread']
    avail = [p for p in proxies if p in df.columns]
    monthly = df.groupby('ym')[avail].median()
    monthly.index = monthly.index.to_timestamp()
    
    fig, axes = plt.subplots(len(avail), 1,
        figsize=(13, 3.5*len(avail)), sharex=True)
    if len(avail) == 1: axes = [axes]
    
    labels = {'dto':'DTO','suv':'SUV',
              'total_volatility':'Volatility','baspread':'Spread'}
    colors = ['#1976D2','#388E3C','#F57C00','#D32F2F']
    
    for i, (proxy, ax) in enumerate(zip(avail, axes)):
        ax.plot(monthly.index, monthly[proxy],
                color=colors[i], linewidth=1.3)
        ax.set_ylabel(labels.get(proxy, proxy), fontsize=10)
        ax.grid(True, alpha=0.25)
        for s, e, c in [('2008-01','2009-06','red'),
                         ('2020-01','2020-12','orange'),
                         ('2022-09','2023-06','purple')]:
            ax.axvspan(pd.Timestamp(s), pd.Timestamp(e),
                        alpha=0.1, color=c)
    
    axes[0].set_title(
        'Time-Series of DIVOP Proxies\n'
        'Monthly Cross-Sectional Median, HOSE & HNX',
        fontsize=13, fontweight='bold')
    from matplotlib.patches import Patch
    axes[-1].legend(handles=[
        Patch(facecolor='red', alpha=.2, label='GFC 2008-09'),
        Patch(facecolor='orange', alpha=.2, label='COVID-19'),
        Patch(facecolor='purple', alpha=.2, label='Bond Crisis 2022-23'),
    ], loc='upper right', fontsize=8)
    plt.tight_layout()
    plt.savefig('divop_timeseries.png', dpi=300, bbox_inches='tight')
    plt.show()

41 Putting It All Together

def build_divop_dataset(config):
    """
    Master pipeline: load data, construct all DIVOP proxies,
    merge into a single stock-date panel.
    """
    df = load_daily_data(config)
    df = apply_sample_filters(df, config)
    df = adjust_for_corporate_actions(df)
    calendar = build_trading_calendar(df, config)
    
    df = compute_dto(df, config)
    suv_df = compute_suv(df, calendar, config)
    vol_df = compute_volatility(df, calendar, config)
    df = compute_spread_and_illiq(df, config)
    
    # Merge
    base = df[['ticker','date','ret','close','volume',
                'shares_outstanding','exchange','industry_icb',
                'foreign_ownership_pct','turnover',
                'mato','dto','baspread','amihud_daily','limit_hit']].copy()
    
    if not suv_df.empty:
        base = base.merge(
            suv_df[['ticker','date','suv','predicted_turnover']],
            on=['ticker','date'], how='left')
    if not vol_df.empty:
        base = base.merge(
            vol_df[['ticker','date','total_volatility',
                     'idio_volatility','market_beta']],
            on=['ticker','date'], how='left')
    
    print(f"\n=== Final DIVOP Dataset ===")
    print(f"Shape: {base.shape}")
    print(f"Tickers: {base['ticker'].nunique()}")
    return base

42 Empirical Applications

42.1 Application 1: DIVOP and the Cross-Section of Returns

The fundamental test of the Miller hypothesis is whether stocks with higher divergence of opinion earn lower subsequent returns. We implement Fama-MacBeth cross-sectional regressions:

\[ r_{i,t+1:t+h} = \gamma_{0,t} + \gamma_{1,t} \cdot \text{DIVOP}_{i,t} + \gamma_{2,t}' \mathbf{X}_{i,t} + \varepsilon_{i,t} \]

where \(\mathbf{X}_{i,t}\) includes controls for market beta, log market capitalization, and log book-to-market ratio. The Miller hypothesis predicts \(\bar{\gamma}_1 < 0\).

def fama_macbeth_divop(merged_df, divop_proxy='suv',
                        controls=None, horizon=21):
    """
    Fama-MacBeth cross-sectional regressions.
    Miller predicts gamma_1 < 0; Varian predicts gamma_1 > 0.
    """
    if controls is None:
        controls = ['market_beta', 'log_mktcap']
    
    df = merged_df.copy()
    df = df.sort_values(['ticker', 'date'])
    df['fwd_ret'] = df.groupby('ticker')['ret'].transform(
        lambda x: x.shift(-1).rolling(horizon).sum().shift(-(horizon-1))
    )
    df['log_mktcap'] = np.log(
        df['close'] * df['shares_outstanding'] + 1
    )
    
    reg_vars = ['fwd_ret', divop_proxy] + \
               [c for c in controls if c in df.columns]
    df_reg = df[['ticker','date'] + reg_vars].dropna()
    
    from numpy.linalg import lstsq
    results = []
    for date, cross in df_reg.groupby('date'):
        if len(cross) < 30: continue
        y = cross['fwd_ret'].values
        X_cols = [divop_proxy] + [c for c in controls if c in cross.columns]
        X = np.column_stack([np.ones(len(cross)), cross[X_cols].values])
        try:
            coefs, _, _, _ = lstsq(X, y, rcond=None)
            results.append({
                'date': date, 'intercept': coefs[0],
                f'gamma_{divop_proxy}': coefs[1], 'n': len(cross),
            })
        except Exception: continue
    
    fm = pd.DataFrame(results)
    gc = f'gamma_{divop_proxy}'
    mu = fm[gc].mean()
    se = fm[gc].std() / np.sqrt(len(fm))
    t = mu / se
    
    print(f"\n=== Fama-MacBeth: {divop_proxy} -> "
          f"{horizon}-day fwd returns ===")
    print(f"  Mean gamma: {mu:.6f}, t-stat: {t:.3f}")
    if t < -1.96:   print("  -> Supports Miller (1977)")
    elif t > 1.96:   print("  -> Supports Varian (1985)")
    else:            print("  -> Inconclusive at 5%")
    return fm

42.2 Application 2: DIVOP and Earnings Announcements

Following Berkman et al. (2009), we test whether high-DIVOP stocks experience negative abnormal returns around earnings announcements, as uncertainty resolution reduces the optimism premium.

def divop_earnings_event(merged_df, ea_dates_df,
                          divop_proxy='suv', window=(-1, 3)):
    """
    Sort stocks into DIVOP quintiles pre-EA, compute CAR in window.
    Miller predicts: Q5 (high DIVOP) has lower CAR than Q1 (low DIVOP).
    """
    df = merged_df.copy()
    ea = ea_dates_df.copy()
    
    # Pre-EA DIVOP value (5 days before)
    ea['pre_date'] = ea['ea_date'] - pd.Timedelta(days=5)
    ea = ea.merge(
        df[['ticker','date',divop_proxy]].rename(
            columns={'date':'pre_date'}),
        on=['ticker','pre_date'], how='inner'
    )
    ea['divop_q'] = pd.qcut(
        ea[divop_proxy], 5,
        labels=['Q1 Low','Q2','Q3','Q4','Q5 High'],
        duplicates='drop'
    )
    
    print(f"\n=== EA Event Study by {divop_proxy} quintile ===")
    print(f"  Window: ({window[0]}, {window[1]}) days")
    print(f"  Miller predicts: Q5 has lower CAR than Q1")
    return ea

42.3 Application 3: Composite DIVOP Index via PCA

When a single summary measure of disagreement is needed, PCA on the battery of standardized proxies extracts the common “disagreement factor.”

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

def composite_divop_pca(merged_df, proxies=None):
    """Extract first principal component from standardized DIVOP proxies."""
    if proxies is None:
        proxies = ['dto','suv','total_volatility','idio_volatility',
                   'baspread','amihud_daily']
    avail = [p for p in proxies if p in merged_df.columns]
    data = merged_df[['ticker','date'] + avail].dropna()
    
    scaler = StandardScaler()
    X = scaler.fit_transform(data[avail])
    
    pca = PCA(n_components=3)
    factors = pca.fit_transform(X)
    data['divop_composite'] = factors[:, 0]
    
    # Ensure positive correlation with inputs
    for col in avail:
        if data['divop_composite'].corr(data[col]) < 0:
            data['divop_composite'] *= -1
            break
    
    loadings = pd.DataFrame(
        pca.components_.T, index=avail,
        columns=['PC1','PC2','PC3']
    )
    
    print(f"\n=== PCA Composite DIVOP ===")
    print(f"Variance explained: "
          f"{pca.explained_variance_ratio_[:3].round(3)}")
    print(f"\nLoadings:\n{loadings.to_string(float_format='{:.4f}'.format)}")
    return data[['ticker','date','divop_composite']], loadings

43 Conclusion and Practical Recommendations

This chapter has provided a comprehensive methodology for constructing seven distinct proxies for divergence of investor opinion adapted to the Vietnamese equity market. We conclude with practical recommendations:

1. Prefer multiple proxies. No single DIVOP measure is without limitations. We recommend constructing and reporting results for at least three proxies spanning different economic channels (volume, volatility, spreads or analyst-based).

2. Account for Vietnam-specific microstructure. Daily price limits, T+2 settlement, foreign ownership constraints, and the order-driven market structure all affect DIVOP properties. Flag limit-hit days, include exchange fixed effects, and control for foreign ownership.

3. Vietnam as a natural laboratory for Miller (1977). The absence of short selling through 2024 and the dominance of retail investors create conditions that closely match Miller’s theoretical setting. The introduction of short selling in 2025 creates a natural experiment for examining how relaxation of short-sale constraints affects the DIVOP-return relation.

4. Control for analyst coverage when using DISP measures. With typical coverage of 5–10 analysts per firm, forecast dispersion is estimated with greater noise than in developed markets. Always include the number of analysts as a control variable and conduct robustness checks with varying minimum-analyst thresholds.

5. Consider constructing a composite index. When researchers need a single summary measure of disagreement, the PCA-based composite index described in Chapter 42 provides a principled approach to aggregating information across the individual proxies. The first principal component typically explains 30-50% of the common variation in the battery of DIVOP measures.

6. Winsorize aggressively. Several DIVOP proxies (particularly DISP1, Amihud ILLIQ, and SUV) exhibit extreme outliers in the Vietnamese data. Winsorization at the 1st and 99th percentiles (or even 2nd and 98th for DISP1) is essential for obtaining reliable regression results.

7. Be cautious about causal inference. DIVOP proxies are endogenous, they respond to the same firm characteristics (size, leverage, growth) that also affect returns. Researchers should use appropriate controls, consider instrumental variables where feasible, and be explicit about the limitations of their identification strategy.

The DIVOP framework is particularly relevant for the Vietnamese market at this point in its development. As the market matures toward potential FTSE Emerging Market reclassification, as short selling becomes more widely available, and as institutional investor participation grows, the dynamics of opinion divergence and its pricing implications are likely to evolve significantly. The methodology presented in this chapter provides researchers with the tools to document and analyze these changes as they unfold.