27 Earnings Management: Detection and Measurement

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import statsmodels.api as sm
import statsmodels.formula.api as smf
from scipy import stats
from typing import Optional
import warnings
warnings.filterwarnings("ignore")

plt.rcParams.update({
    "figure.dpi": 150,
    "axes.spines.top": False,
    "axes.spines.right": False,
    "font.size": 11,
})

Earnings management refers to the purposeful intervention by managers in the financial reporting process to achieve outcomes that serve private objectives, whether to meet analyst forecasts, trigger bonus thresholds, avoid covenant violations, or influence equity valuations. While the concept is intuitive, its rigorous detection poses one of the most enduring methodological challenges in empirical accounting research.

The difficulty is fundamental: researchers observe total accruals, which combine a legitimate component (reflecting genuine economic activity) with a discretionary component (reflecting managerial intervention). Separating these two components requires a model of what accruals should be absent manipulation, what the literature calls non-discretionary accruals (NDA). Any residual, the gap between observed accruals and the model’s prediction, is then attributed to managerial discretion. The quality of the detection method therefore hinges entirely on the quality of this model.

This chapter examines three foundational approaches to measuring earnings management (i.e., the Healy (1985) model, the Jones (1991) model, and the modified Jones model of Dechow, Sloan, and Sweeney (1995)) and evaluates their statistical properties using simulation. We then adapt the analysis to the institutional setting of Vietnam, where distinct governance structures, accounting standards, and enforcement regimes create both unique incentives for earnings management and unique challenges for its detection.

27.0.1 Defining Earnings Management

A useful taxonomy distinguishes three forms:

Accrual-based earnings management (AEM). Managers exploit discretion within accounting standards to shift the timing of revenue or expense recognition. Examples include aggressive revenue recognition, delays in write-downs, or manipulation of allowance estimates. This form does not alter the firm’s underlying cash flows.
Real earnings management (REM). Managers take genuine economic actions (e.g., overproduction to reduce unit costs, cutting R&D or advertising expenditure, offering price discounts to accelerate sales) that have real cash flow consequences (Roychowdhury 2006). These actions may improve current-period earnings at the expense of future value.
Classification shifting. Managers reclassify core expenses as non-recurring items to inflate core earnings without changing bottom-line net income. This form leaves both total accruals and cash flows unchanged.

27.0.2 Why Vietnam?

Vietnam’s institutional environment amplifies several channels relevant to earnings management research:

State-owned enterprise (SOE) incentives. Partially privatized SOEs face dual pressures: political targets from state shareholders and market expectations from minority investors.
Regulatory enforcement. Vietnam’s State Securities Commission (SSC) has limited resources and a short institutional history relative to bodies like the U.S. SEC. Weaker enforcement reduces the expected cost of manipulation, potentially increasing its prevalence (Leuz, Nanda, and Wysocki 2003).
Accounting standards. VAS (Vietnamese Accounting Standards) are based on older IAS versions and have not fully converged with IFRS. Certain VAS provisions, such as rules-based revenue recognition criteria and prescribed depreciation methods, constrain some forms of discretion while creating predictable opportunities for others.
Benchmark-beating behavior. Burgstahler and Dichev (1997) documented discontinuities around zero earnings and zero earnings changes in U.S. data. The Vietnamese market, with its high retail participation and emphasis on headline profitability, may exhibit similar or even more pronounced patterns.

27.1 Models of Non-Discretionary Accruals

27.1.1 Notation and Setup

Let $i$ index firms and $t$ index fiscal years. Define:

Table 27.1: Notation for earnings management models

Symbol	Definition
$TA_{i,t}$	Total accruals (scaled by lagged assets)
$NDA_{i,t}$	Non-discretionary accruals (model prediction)
$DA_{i,t}$	Discretionary accruals: $DA_{i,t} = TA_{i,t} - NDA_{i,t}$
$A_{i,t}$	Total assets
$\Delta Rev_{i,t}$	Change in revenues, scaled by $A_{i,t-1}$
$\Delta Rec_{i,t}$	Change in net receivables, scaled by $A_{i,t-1}$
$PPE_{i,t}$	Gross property, plant, and equipment, scaled by $A_{i,t-1}$
$PART_{i,t}$	Indicator equal to 1 for the test (event) year

Total accruals are computed using the balance sheet approach:

\[ TA_{i,t} = \frac{(\Delta CA_{i,t} - \Delta Cash_{i,t}) - (\Delta CL_{i,t} - \Delta STD_{i,t}) - Dep_{i,t}}{A_{i,t-1}} \tag{27.1}\]

where $\Delta CA$ is the change in current assets, $\Delta Cash$ is the change in cash, $\Delta CL$ is the change in current liabilities, $\Delta STD$ is the change in short-term debt, and $Dep$ is depreciation expense.

27.1.2 Five Models

We implement five models, each estimating $NDA$ during a firm-specific estimation window and computing $DA$ for the test year as the residual.

Model 1: Healy (1985). Non-discretionary accruals equal the mean of total accruals during the estimation period:

\[ NDA^{Healy}_{i,t} = \frac{1}{T} \sum_{s \in \text{est}} TA_{i,s} \tag{27.2}\]

This is the simplest possible benchmark. Its limitation is obvious: it treats all time-variation in accruals as discretionary, even variation driven by changes in the firm’s economic environment.

Model 2: DeAngelo (1986). Non-discretionary accruals equal last period’s total accruals:

\[ NDA^{DeAngelo}_{i,t} = TA_{i,t-1} \tag{27.3}\]

This is equivalent to assuming that the change in total accruals is entirely discretionary. It performs well when accruals follow a random walk, but poorly when they exhibit mean-reversion or trend.

Model 3: Jones (1991). This model controls for changes in a firm’s economic environment by regressing accruals on revenue changes and the level of fixed assets:

\[ TA_{i,t} = \alpha_1 \frac{1}{A_{i,t-1}} + \alpha_2 \Delta Rev_{i,t} + \alpha_3 PPE_{i,t} + \varepsilon_{i,t} \tag{27.4}\]

The parameters $\hat{\alpha}_1, \hat{\alpha}_2, \hat{\alpha}_3$ are estimated on the estimation-period data. Non-discretionary accruals for the test year are the fitted values from this regression applied to test-year covariates.

The economic logic is that revenue growth generates legitimate working capital accruals (higher receivables and inventory), while fixed assets proxy for non-discretionary depreciation charges. The intercept is scaled by lagged assets rather than included as a conventional constant, following Jones (1991).

Model 4: Modified Jones (Dechow, Sloan, and Sweeney 1995). The modification adjusts revenue changes for changes in receivables during the test year, on the premise that credit revenue growth is more susceptible to manipulation than cash revenue growth:

\[ NDA^{ModJones}_{i,t} = \hat{\alpha}_1 \frac{1}{A_{i,t-1}} + \hat{\alpha}_2 (\Delta Rev_{i,t} - \Delta Rec_{i,t}) + \hat{\alpha}_3 PPE_{i,t} \tag{27.5}\]

The coefficients are still estimated from the unadjusted Jones model (Equation 27.4) on estimation-period data, but receivables are subtracted from revenues only when computing fitted values for the test year.

Model 5: Industry Model. This model assumes that the common component of accruals within an industry captures non-discretionary variation:

\[ TA_{i,t} = \phi_0 + \phi_1 \cdot \widetilde{TA}_{j,t} + \eta_{i,t} \tag{27.6}\]

where $\widetilde{TA}_{j,t}$ is the median total accrual across all firms in industry $j$ (excluding firm $i$), estimated during the estimation period.

27.1.3 Implementation

def calc_accruals(df: pd.DataFrame) -> pd.DataFrame:
    """
    Compute total accruals using the balance sheet approach.
    
    Expected columns: gvkey/ticker, year/fyear, at, act, che, lct, dlc,
                      dp, sale, rect, ppegt
    """
    firm_col = "ticker" if "ticker" in df.columns else "gvkey"
    year_col = "year" if "year" in df.columns else "fyear"
    
    df = df.sort_values([firm_col, year_col]).copy()
    
    # Lagged values and changes
    g = df.groupby(firm_col)
    df["lag_at"] = g["at"].shift(1)
    df["d_ca"] = g["act"].diff()
    df["d_cash"] = g["che"].diff()
    df["d_cl"] = g["lct"].diff()
    df["d_std"] = g["dlc"].diff()
    df["d_rev"] = g["sale"].diff()
    df["d_rec"] = g["rect"].diff()
    
    # Total accruals (raw)
    df["acc_raw"] = (df["d_ca"] - df["d_cash"] - df["d_cl"] + df["d_std"]) - df["dp"]
    
    return df

def fit_healy(df: pd.DataFrame) -> pd.DataFrame:
    """Healy (1985): NDA = mean accruals in estimation period."""
    est_mean = df.loc[~df["part"], "acc_at"].mean()
    df = df.copy()
    df["nda_healy"] = est_mean
    df["da_healy"] = df["acc_at"] - df["nda_healy"]
    return df


def fit_deangelo(df: pd.DataFrame) -> pd.DataFrame:
    """DeAngelo (1986): NDA = prior-period total accruals."""
    df = df.copy()
    df["nda_deangelo"] = df["acc_at"].shift(1)
    df["da_deangelo"] = df["acc_at"] - df["nda_deangelo"]
    return df


def fit_jones(df: pd.DataFrame) -> pd.DataFrame:
    """Jones (1991): Regression-based NDA controlling for revenue and PPE."""
    df = df.copy()
    est = df[~df["part"]].dropna(subset=["acc_at", "one_at", "d_rev_at", "ppe_at"])
    
    if len(est) < 5:
        df["nda_jones"] = np.nan
        df["da_jones"] = np.nan
        return df
    
    y = est["acc_at"]
    X = est[["one_at", "d_rev_at", "ppe_at"]]
    model = sm.OLS(y, X).fit()
    
    pred_X = df[["one_at", "d_rev_at", "ppe_at"]].copy()
    df["nda_jones"] = model.predict(pred_X)
    df["da_jones"] = df["acc_at"] - df["nda_jones"]
    return df


def fit_mod_jones(df: pd.DataFrame) -> pd.DataFrame:
    """Modified Jones (Dechow et al., 1995): Adjust revenues for receivables."""
    df = df.copy()
    est = df[~df["part"]].dropna(subset=["acc_at", "one_at", "d_rev_at", "ppe_at"])
    
    if len(est) < 5:
        df["nda_mod_jones"] = np.nan
        df["da_mod_jones"] = np.nan
        return df
    
    # Estimate on unadjusted Jones model
    y = est["acc_at"]
    X = est[["one_at", "d_rev_at", "ppe_at"]]
    model = sm.OLS(y, X).fit()
    
    # Predict using adjusted revenue (subtract receivable changes)
    pred_X = df[["one_at", "d_rev_alt_at", "ppe_at"]].copy()
    pred_X.columns = ["one_at", "d_rev_at", "ppe_at"]
    df["nda_mod_jones"] = model.predict(pred_X)
    df["da_mod_jones"] = df["acc_at"] - df["nda_mod_jones"]
    return df


def fit_industry(df: pd.DataFrame) -> pd.DataFrame:
    """Industry model: NDA = f(industry median accruals)."""
    df = df.copy()
    est = df[~df["part"]].dropna(subset=["acc_at", "acc_ind"])
    
    if len(est) < 5:
        df["nda_industry"] = np.nan
        df["da_industry"] = np.nan
        return df
    
    y = est["acc_at"]
    X = sm.add_constant(est["acc_ind"])
    model = sm.OLS(y, X).fit()
    
    pred_X = sm.add_constant(df["acc_ind"])
    df["nda_industry"] = model.predict(pred_X)
    df["da_industry"] = df["acc_at"] - df["nda_industry"]
    return df

def prepare_model_vars(df: pd.DataFrame) -> pd.DataFrame:
    """Add scaled variables needed by the five NDA models."""
    df = calc_accruals(df)
    firm_col = "ticker" if "ticker" in df.columns else "gvkey"
    
    df["sic2"] = df["sic"].astype(str).str[:2]
    df["acc_at"] = df["acc_raw"] / df["lag_at"]
    df["one_at"] = 1.0 / df["lag_at"]
    df["d_rev_at"] = df["d_rev"] / df["lag_at"]
    df["d_rev_alt_at"] = (df["d_rev"] - df["d_rec"]) / df["lag_at"]
    df["ppe_at"] = df["ppegt"] / df["lag_at"]
    
    # Industry median accruals (estimation period only)
    # est_acc = df.loc[~df["part"], ["sic2", "acc_at"]].copy()
    est_acc = df.loc[df["part"] == False, ["sic2", "acc_at"]].copy()
    ind_median = est_acc.groupby("sic2")["acc_at"].median().rename("acc_ind")
    df = df.merge(ind_median, on="sic2", how="left")
    
    return df


def get_all_nda(df: pd.DataFrame) -> pd.DataFrame:
    """
    Apply all five NDA models on a firm-by-firm basis.
    Returns a DataFrame with DA columns for each model.
    """
    firm_col = "ticker" if "ticker" in df.columns else "gvkey"
    year_col = "year" if "year" in df.columns else "fyear"
    
    df_mod = prepare_model_vars(df)
    df_mod["part"] = df_mod["part"].astype(bool)
    
    results = []
    for firm, group in df_mod.groupby(firm_col):
        g = group.sort_values(year_col).copy()
        g = fit_healy(g)
        g = fit_deangelo(g)
        g = fit_jones(g)
        g = fit_mod_jones(g)
        g = fit_industry(g)
        results.append(g)
    
    return pd.concat(results, ignore_index=True)

27.1.4 The Salkever (1976) Correction

An important but underappreciated methodological issue arises when computing standard errors for discretionary accruals under the Jones-type models. In the standard two-stage procedure, the researcher first estimates the Jones model on the estimation period, then computes discretionary accruals for the test year as a prediction error. But the standard error of a prediction error is larger than the standard error of a fitted residual, because it incorporates parameter uncertainty from the first-stage estimation. Ignoring this distinction leads to understated standard errors and inflated rejection rates,exactly the problem documented in Dechow, Sloan, and Sweeney (1995).

Salkever (1976) provides an elegant solution: run a single regression on the combined estimation and test periods, including a dummy variable $PART$ for the test year. The coefficient on $PART$ equals the prediction error (discretionary accruals), and its standard error correctly accounts for first-stage estimation uncertainty.

For the Jones model, the Salkever single-stage regression is:

\[ TA_{i,t} = \alpha_1 \frac{1}{A_{i,t-1}} + \alpha_2 \Delta Rev_{i,t} + \alpha_3 PPE_{i,t} + \delta \cdot PART_{i,t} + u_{i,t} \tag{27.7}\]

The coefficient $\hat{\delta}$ is numerically identical to the two-stage $DA$ estimate, but its standard error $\text{se}(\hat{\delta})$ is the correct prediction error standard error.

def demonstrate_salkever(df_firm: pd.DataFrame) -> pd.DataFrame:
    """
    For a single firm, show that the two-stage Jones DA equals
    the Salkever one-stage coefficient on PART.
    """
    df = prepare_model_vars(df_firm)
    needed = ["acc_at", "one_at", "d_rev_at", "ppe_at", "part"]
    df = df.dropna(subset=needed).copy()
    
    # ── Two-stage approach ──
    est = df[~df["part"]]
    y_est = est["acc_at"]
    X_est = est[["one_at", "d_rev_at", "ppe_at"]]
    fm_stage1 = sm.OLS(y_est, X_est).fit()
    
    df["nda_two_stage"] = fm_stage1.predict(df[["one_at", "d_rev_at", "ppe_at"]])
    df["da_two_stage"] = df["acc_at"] - df["nda_two_stage"]
    
    # Test-year DA from two-stage
    da_two_stage = df.loc[df["part"], "da_two_stage"].values
    
    # ── Salkever one-stage ──
    df["part_float"] = df["part"].astype(float)
    y_full = df["acc_at"]
    X_full = df[["one_at", "d_rev_at", "ppe_at", "part_float"]]
    fm_salkever = sm.OLS(y_full, X_full).fit()
    
    da_salkever = fm_salkever.params["part_float"]
    se_two_stage_wrong = np.nan  # two-stage doesn't give correct SE
    se_salkever = fm_salkever.bse["part_float"]
    
    return pd.DataFrame({
        "Method": ["Two-stage Jones", "Salkever one-stage"],
        "DA estimate": [da_two_stage[0] if len(da_two_stage) else np.nan,
                        da_salkever],
        "Correct SE": ["Not available", f"{se_salkever:.6f}"],
        "t-statistic": ["Biased", f"{fm_salkever.tvalues['part_float']:.4f}"],
    })

27.2 Type I Error Under the Null Hypothesis

27.2.1 Experimental Design

To evaluate whether the five models produce well-calibrated test statistics, we conduct a simulation experiment parallel to Table 2 of Dechow, Sloan, and Sweeney (1995). The procedure is:

Generate a panel of $N$ firms, each with $T$ years of financial statement data.
For each firm, randomly designate one year as the test year ($PART = 1$). By construction, no earnings management occurs in this year.
Estimate discretionary accruals using each of the five models.
Regress $DA$ on $PART$ for each firm and record whether the null $H_0: \delta = 0$ is rejected at the 5% and 1% significance levels.
Compute the rejection rate across all $N$ firms.

If the model is well-specified, rejection rates should equal the nominal test size (5% or 1%). Systematic over-rejection indicates that the model produces biased test statistics, a critical flaw for research that relies on these measures to draw causal inferences.

27.2.2 Data Generation

We generate synthetic panel data that preserves the key cross-sectional and time-series properties of Vietnamese listed firms while allowing us to know with certainty that no manipulation exists.

def generate_em_panel(
    n_firms: int = 500,
    n_years: int = 15,
    seed: int = 2024,
) -> pd.DataFrame:
    """
    Generate a synthetic panel of Vietnamese-style financial data.
    No earnings management is present by construction.
    
    The data generation process captures:
    - AR(1) revenue process
    - Accruals driven by revenue growth and PPE levels
    - Industry-level common shocks
    - SOE/non-SOE heterogeneity
    """
    rng = np.random.default_rng(seed)
    
    industries = [10, 20, 25, 41, 46, 47, 52, 62, 64, 68]
    
    records = []
    for i in range(n_firms):
        # Firm characteristics
        sic = rng.choice(industries)
        is_soe = int(rng.random() < 0.30)
        base_assets = rng.lognormal(mean=12, sigma=1.5)  # VND billions
        growth_rate = rng.normal(0.08, 0.04)
        
        # Jones model parameters (firm-specific true DGP)
        true_alpha1 = rng.normal(0, 0.02)
        true_alpha2 = rng.normal(0.06, 0.03)   # revenue-accrual sensitivity
        true_alpha3 = rng.normal(-0.04, 0.02)  # depreciation effect
        
        at_prev = base_assets
        sale_prev = base_assets * rng.uniform(0.5, 1.5)
        rect_prev = sale_prev * rng.uniform(0.05, 0.25)
        
        for t in range(n_years):
            year = 2009 + t
            
            # Evolve fundamentals
            at = at_prev * (1 + growth_rate + rng.normal(0, 0.05))
            sale = sale_prev * (1 + rng.normal(0.06, 0.08))
            rect = sale * rng.uniform(0.05, 0.25)
            ppegt = at * rng.uniform(0.3, 0.7)
            
            # Generate accruals from true Jones DGP + noise
            d_rev = sale - sale_prev
            one_at = 1.0 / at_prev
            d_rev_at = d_rev / at_prev
            ppe_at = ppegt / at_prev
            
            acc_at = (true_alpha1 * one_at 
                      + true_alpha2 * d_rev_at 
                      + true_alpha3 * ppe_at
                      + rng.normal(0, 0.03))
            
            acc_raw = acc_at * at_prev
            
            # Reverse-engineer balance sheet items consistent with accruals
            dp = ppegt * rng.uniform(0.05, 0.12)
            d_cl = rng.normal(0, at * 0.02)
            d_std = rng.normal(0, at * 0.01)
            d_cash = rng.normal(0, at * 0.02)
            d_ca = acc_raw + dp + d_cl - d_std + d_cash
            
            act = at * rng.uniform(0.3, 0.6)
            che = act * rng.uniform(0.05, 0.2)
            lct = at * rng.uniform(0.15, 0.35)
            dlc = lct * rng.uniform(0.1, 0.4)
            ni = sale * rng.uniform(0.03, 0.12)
            ib = ni
            
            records.append({
                "ticker": f"VN{i:04d}",
                "fyear": year,
                "at": at,
                "act": act,
                "che": che,
                "lct": lct,
                "dlc": dlc,
                "dp": dp,
                "sale": sale,
                "rect": rect,
                "ppegt": ppegt,
                "ni": ni,
                "ib": ib,
                "sic": sic,
                "is_soe": is_soe,
            })
            
            at_prev = at
            sale_prev = sale
            rect_prev = rect
    
    df = pd.DataFrame(records)
    return df

panel_raw = generate_em_panel(n_firms=500, n_years=15, seed=2024)
print(f"Panel: {panel_raw.shape[0]:,} firm-years, "
      f"{panel_raw['ticker'].nunique()} firms")

Panel: 7,500 firm-years, 500 firms

27.2.3 Sample Construction

We construct a sample of 500 firms, each with a randomly designated test year, mirroring the design of Dechow, Sloan, and Sweeney (1995).

def construct_sample(
    df: pd.DataFrame,
    n_sample: int = 500,
    min_est_years: int = 10,
    seed: int = 42,
    selection_filter: Optional[callable] = None,
) -> pd.DataFrame:
    """
    For each firm, randomly assign one year as the test year (part=True).
    Require at least min_est_years of estimation data.
    """
    rng = np.random.default_rng(seed)
    firm_col = "ticker"
    year_col = "fyear"
    
    # Compute accruals and filter for data availability
    df = calc_accruals(df)
    required = ["acc_raw", "lag_at", "d_rev", "d_rec", "ppegt"]
    df = df.dropna(subset=required)
    df = df[df["lag_at"] > 0]
    
    # Require minimum years
    firm_counts = df.groupby(firm_col)[year_col].count()
    eligible = firm_counts[firm_counts >= (min_est_years + 1)].index
    df = df[df[firm_col].isin(eligible)]
    
    if selection_filter is not None:
        df = selection_filter(df)
    
    # Sample n_sample firms
    firms = df[firm_col].unique()
    if len(firms) > n_sample:
        firms = rng.choice(firms, n_sample, replace=False)
    df = df[df[firm_col].isin(firms)].copy()
    
    # For each firm, randomly pick one test year (not the first year)
    parts = []
    for firm, group in df.groupby(firm_col):
        years = group[year_col].sort_values().values
        if len(years) < 2:
            continue
        test_year = rng.choice(years[1:])
        parts.append({firm_col: firm, year_col: test_year, "part": True})
    
    part_df = pd.DataFrame(parts)
    df = df.merge(part_df, on=[firm_col, year_col], how="left")
    df["part"] = df["part"].fillna(False)
    df["part"] = df["part"].astype(bool)

    return df

sample_1 = construct_sample(panel_raw, n_sample=500, seed=2024)
print(f"Sample 1: {sample_1.shape[0]:,} firm-years, "
      f"{sample_1['ticker'].nunique()} firms, "
      f"{sample_1['part'].sum()} test years")

Sample 1: 7,000 firm-years, 500 firms, 500 test years

27.2.4 Estimating Discretionary Accruals

da_results = get_all_nda(sample_1)

# Verify: peek at test-year DA across models
da_cols = ["da_healy", "da_deangelo", "da_jones", "da_mod_jones", "da_industry"]
test_da = da_results[da_results["part"]][da_cols]
print("Test-year discretionary accruals (first 5 firms):")
test_da.head().round(4)

Test-year discretionary accruals (first 5 firms):

	da_healy	da_deangelo	da_jones	da_mod_jones	da_industry
4	-0.0791	-0.1320	-0.1252	-0.1511	-0.0791
23	-0.0792	0.0813	-0.1799	-0.1426	-0.0792
30	0.0618	0.1403	0.1458	0.1308	0.0618
45	0.2106	0.2069	0.0578	0.5023	0.2106
61	-0.1326	0.0865	-0.1425	-0.1193	-0.1326

27.2.5 Firm-Level Regressions and Rejection Rates

For each firm and each model, we regress $DA$ on $PART$ and record whether the null hypothesis of zero discretionary accruals in the test year is rejected.

def firm_regressions(df: pd.DataFrame, models: list[str]) -> pd.DataFrame:
    """
    For each firm and model, regress DA on PART.
    Return coefficients, std errors, t-stats, and rejection indicators.
    """
    firm_col = "ticker" if "ticker" in df.columns else "gvkey"
    records = []
    
    for firm, group in df.groupby(firm_col):
        for model in models:
            da_col = f"da_{model}"
            g = group.dropna(subset=[da_col]).copy()
            g["part_float"] = g["part"].astype(float)
            
            if len(g) < 5 or g["part"].sum() == 0:
                continue
            
            try:
                fm = sm.OLS(
                    g[da_col], sm.add_constant(g["part_float"])
                ).fit()
                
                coef = fm.params["part_float"]
                se = fm.bse["part_float"]
                t_stat = fm.tvalues["part_float"]
                df_resid = fm.df_resid
                
                # One-sided p-values
                p_neg = stats.t.cdf(t_stat, df_resid)
                p_pos = 1 - p_neg
                
                records.append({
                    firm_col: firm,
                    "model": model,
                    "coef": coef,
                    "se": se,
                    "t_stat": t_stat,
                    "neg_p01": p_neg < 0.01,
                    "neg_p05": p_neg < 0.05,
                    "pos_p01": p_pos < 0.01,
                    "pos_p05": p_pos < 0.05,
                })
            except Exception:
                continue
    
    return pd.DataFrame(records)

models = ["healy", "deangelo", "jones", "mod_jones", "industry"]
reg_results = firm_regressions(da_results, models)

27.2.6 Results

Table 27.2 reports the distribution of estimated coefficients on $PART$ across firms. Under the null of no manipulation, we expect the mean coefficient to be approximately zero.

coef_stats = (
    reg_results
    .groupby("model")["coef"]
    .agg(["mean", "std", lambda x: x.quantile(0.25),
          "median", lambda x: x.quantile(0.75)])
)
coef_stats.columns = ["Mean", "Std Dev", "Q1", "Median", "Q3"]
coef_stats.index.name = "Model"
coef_stats.round(4)

Table 27.2: Distribution of firm-level discretionary accrual estimates (coefficient on PART) under the null hypothesis of no earnings management. All five models produce mean estimates near zero, as expected.

	Mean	Std Dev	Q1	Median	Q3
Model
deangelo	0.0110	0.2584	-0.1604	0.0140	0.1839
healy	0.0047	0.1506	-0.0993	0.0050	0.1066
industry	0.0047	0.1506	-0.0993	0.0050	0.1066
jones	0.0035	0.1739	-0.1134	0.0069	0.1125
mod_jones	0.0068	0.1807	-0.1127	0.0108	0.1177

Table 27.3 reports rejection rates. The critical comparison is whether these rates approximate the nominal test size.

rejection_rates = (
    reg_results
    .groupby("model")[["neg_p01", "neg_p05", "pos_p01", "pos_p05"]]
    .mean()
    .round(4)
)
rejection_rates.columns = [
    "Neg (1%)", "Neg (5%)", "Pos (1%)", "Pos (5%)"
]
rejection_rates.index.name = "Model"
rejection_rates

Table 27.3: Type I error rates for one-sided tests of earnings management under the null hypothesis. Rates exceeding the nominal size (5% or 1%) indicate that the model over-rejects—a significant concern for the Jones and Modified Jones models.

	Neg (1%)	Neg (5%)	Pos (1%)	Pos (5%)
Model
deangelo	0.0109	0.0457	0.0043	0.0478
healy	0.0100	0.0380	0.0080	0.0500
industry	0.0100	0.0380	0.0080	0.0500
jones	0.0280	0.0800	0.0380	0.1000
mod_jones	0.0320	0.0740	0.0300	0.0900

27.2.7 Binomial Test for Size Distortion

We formally test whether observed rejection rates differ from nominal sizes using a two-sided binomial test. Small $p$-values indicate significant mis-calibration of the test statistic.

def binom_test_rate(series: pd.Series, nominal: float) -> float:
    """Two-sided binomial test for rejection rate = nominal."""
    x = series.dropna()
    k = int(x.sum())
    n = len(x)
    if n == 0:
        return np.nan
    return stats.binomtest(k, n, nominal, alternative="two-sided").pvalue

binom_results = {}
for model, group in reg_results.groupby("model"):
    binom_results[model] = {
        "Neg (1%)": binom_test_rate(group["neg_p01"], 0.01),
        "Neg (5%)": binom_test_rate(group["neg_p05"], 0.05),
        "Pos (1%)": binom_test_rate(group["pos_p01"], 0.01),
        "Pos (5%)": binom_test_rate(group["pos_p05"], 0.05),
    }

binom_df = pd.DataFrame(binom_results).T.round(4)
binom_df.index.name = "Model"
binom_df

Table 27.4: Binomial test p-values for whether rejection rates equal nominal test sizes. Small values (e.g., < 0.05) indicate statistically significant size distortion.

	Neg (1%)	Neg (5%)	Pos (1%)	Pos (5%)
Model
deangelo	0.8117	0.7486	0.3423	0.9150
healy	1.0000	0.2581	0.8236	1.0000
industry	1.0000	0.2581	0.8236	1.0000
jones	0.0006	0.0038	0.0000	0.0000
mod_jones	0.0001	0.0179	0.0002	0.0002

Interpreting Over-Rejection

If the Jones and Modified Jones models show rejection rates significantly above 5%, this signals that the standard two-stage procedure produces anti-conservative test statistics. The Salkever (1976) correction addresses this by computing standard errors that reflect first-stage estimation uncertainty. In practical research on Vietnamese firms, where sample sizes per firm are often short (10–15 years of listed history), this correction is especially important because prediction error variance is a larger fraction of residual variance with fewer estimation-period observations.

27.3 Extreme Performance Firms

A well-known weakness of accrual-based models is their poor performance when test firms experience extreme economic performance. Dechow, Sloan, and Sweeney (1995) documented that all five models over-reject the null hypothesis when test firm-years are drawn from the tails of the earnings or cash flow distribution. Kothari, Leone, and Wasley (2005) subsequently proposed “performance matching” as a partial remedy.

The intuition for the problem is straightforward: the Jones model assumes a linear, symmetric relationship between revenue changes and accruals. But firms experiencing extreme growth or contraction generate accruals that deviate nonlinearly from the model’s predictions, even absent any manipulation. This nonlinearity is misattributed to discretionary accruals.

27.3.1 Constructing Extreme-Performance Samples

def add_earnings_deciles(df: pd.DataFrame) -> pd.DataFrame:
    """Compute firm-year earnings (scaled) and assign to deciles."""
    firm_col = "ticker" if "ticker" in df.columns else "gvkey"
    df = df.copy()
    g = df.groupby(firm_col)
    df["lag_at_earn"] = g["at"].shift(1)
    df["earn"] = df["ib"] / df["lag_at_earn"]
    df["earn_decile"] = pd.qcut(
        df["earn"], 10, labels=False, duplicates="drop"
    ) + 1
    return df

panel_with_earn = add_earnings_deciles(panel_raw)

# High-earners sample (top decile)
def filter_high_earn(df):
    df = add_earnings_deciles(df)
    return df[df["earn_decile"] == 10]

# Low-earners sample (bottom decile)
def filter_low_earn(df):
    df = add_earnings_deciles(df)
    return df[df["earn_decile"] == 1]

sample_high = construct_sample(
    panel_raw, n_sample=300, seed=100,
    selection_filter=filter_high_earn
)
sample_low = construct_sample(
    panel_raw, n_sample=300, seed=200,
    selection_filter=filter_low_earn
)

print(f"High-earnings sample: {sample_high['ticker'].nunique()} firms")
print(f"Low-earnings sample: {sample_low['ticker'].nunique()} firms")

High-earnings sample: 164 firms
Low-earnings sample: 203 firms

Table 27.5: Type I error rates (5% one-sided) when test firm-years are drawn from extreme earnings deciles. Over-rejection is expected because accrual models misattribute performance-driven accrual variation to managerial discretion.

extreme_results = {}
for label, sample_df in [("High earners", sample_high), ("Low earners", sample_low)]:
    da = get_all_nda(sample_df)
    regs = firm_regressions(da, models)
    rates = regs.groupby("model")[["neg_p05", "pos_p05"]].mean().round(4)
    rates.columns = [f"{label} Neg(5%)", f"{label} Pos(5%)"]
    extreme_results[label] = rates

if extreme_results:
    extreme_df = pd.concat(extreme_results.values(), axis=1)
    extreme_df.index.name = "Model"
    extreme_df

27.3.2 Performance Matching

Kothari, Leone, and Wasley (2005) propose adjusting discretionary accruals by subtracting the DA of a performance-matched firm (one in the same industry with similar ROA). This removes the systematic component of accruals correlated with performance. The matched discretionary accrual is:

\[ DA^{PM}_{i,t} = DA_{i,t} - DA_{i^*,t} \tag{27.8}\]

where $i^*$ is the matched control firm. This approach is particularly relevant in Vietnam, where the cross-section of listed firms includes many high-growth firms alongside stagnant SOEs, which is performance heterogeneity that standard models may mischaracterize.

27.4 Power Analysis

27.4.1 Artificially Introducing Earnings Management

A model that never rejects is useless even if its Type I error rate is perfect. We need to evaluate power: the probability of detecting manipulation when it truly exists. Following Dechow, Sloan, and Sweeney (1995), we introduce known artificial manipulation into test-year financial statements and measure detection rates.

We consider three forms of manipulation at varying magnitudes:

Expense manipulation. Decrease current liabilities by the manipulation amount (e.g., delaying recognition of accrued expenses): \[LCT'_{i,t} = LCT_{i,t} - \lambda \cdot A_{i,t-1}\]
Revenue manipulation. Increase sales and receivables by the manipulation amount (e.g., premature revenue recognition or channel stuffing): \[Sale'_{i,t} = Sale_{i,t} + \lambda \cdot A_{i,t-1}, \quad Rect'_{i,t} = Rect_{i,t} + \lambda \cdot A_{i,t-1}\]
Margin manipulation. Increase sales by the gross amount needed to inflate net income by $\lambda \cdot A_{i,t-1}$, increasing both receivables and current liabilities proportionally.

The parameter $\lambda$ represents manipulation as a fraction of lagged total assets, ranging from 0% to 50%.

def manipulate(
    df: pd.DataFrame,
    level: float = 0.0,
    manip_type: str = "expense",
) -> pd.DataFrame:
    """
    Introduce artificial earnings management of a given type and level
    into test-year (part=True) observations.
    
    Parameters
    ----------
    df : DataFrame with 'part' indicator and required financial variables.
    level : Manipulation as fraction of lagged total assets.
    manip_type : One of 'expense', 'revenue', 'margin'.
    """
    firm_col = "ticker" if "ticker" in df.columns else "gvkey"
    year_col = "fyear" if "fyear" in df.columns else "year"
    
    df = df.sort_values([firm_col, year_col]).copy()
    g = df.groupby(firm_col)
    lag_at = g["at"].shift(1)
    manip_amt = lag_at * level
    
    if manip_type == "expense":
        # Decrease current liabilities in test year
        df.loc[df["part"], "lct"] -= manip_amt[df["part"]]
        
    elif manip_type == "revenue":
        # Increase sales and receivables in test year
        df.loc[df["part"], "sale"] += manip_amt[df["part"]]
        df.loc[df["part"], "rect"] += manip_amt[df["part"]]
        df.loc[df["part"], "act"] += manip_amt[df["part"]]
        # Reverse in following year
        next_part = g["part"].shift(1).fillna(False)
        df.loc[next_part, "sale"] -= manip_amt[next_part]
        
    elif manip_type == "margin":
        # Compute net income ratio for scaling
        est_mask = ~df["part"]
        ni_ratio = g.apply(
            lambda x: (x.loc[~x["part"], "ni"] / x.loc[~x["part"], "sale"]).median()
        )
        df["_ni_ratio"] = df[firm_col].map(ni_ratio)
        gross_amt = np.where(
            df["_ni_ratio"] > 0, manip_amt / df["_ni_ratio"], 0
        )
        df.loc[df["part"], "sale"] += gross_amt[df["part"]]
        df.loc[df["part"], "rect"] += gross_amt[df["part"]]
        df.loc[df["part"], "act"] += gross_amt[df["part"]]
        net_effect = gross_amt - manip_amt
        df.loc[df["part"], "lct"] += net_effect[df["part"]]
        df.drop(columns=["_ni_ratio"], inplace=True)
    
    return df

levels = [0.0, 0.05, 0.10, 0.20, 0.30, 0.50]
manip_types = ["expense", "revenue", "margin"]

power_records = []

for level in levels:
    for mtype in manip_types:
        # Copy base sample and introduce manipulation
        manip_sample = sample_1.copy()
        if level > 0:
            manip_sample = manipulate(manip_sample, level=level, manip_type=mtype)
        
        # Estimate DA and run tests
        da = get_all_nda(manip_sample)
        regs = firm_regressions(da, models)
        
        # Power = positive one-sided rejection rate at 5%
        power = regs.groupby("model")["pos_p05"].mean()
        for model_name, pwr in power.items():
            power_records.append({
                "level": level,
                "type": mtype,
                "model": model_name,
                "power": pwr,
            })

power_df = pd.DataFrame(power_records)

levels = [0.0, 0.05, 0.10, 0.20, 0.30, 0.50]
manip_types = ["expense", "revenue", "margin"]

from joblib import Parallel, delayed
from itertools import product

def _run_one(level, mtype, base_sample, models):
    manip_sample = base_sample.copy()
    if level > 0:
        manip_sample = manipulate(manip_sample, level=level, manip_type=mtype)
    da = get_all_nda(manip_sample)
    regs = firm_regressions(da, models)
    power = regs.groupby("model")["pos_p05"].mean()
    return [
        {"level": level, "type": mtype, "model": m, "power": p}
        for m, p in power.items()
    ]

# Skip redundant level=0 runs (all manip_types identical when level=0)
combos = [(0.0, "expense")] + [
    (l, m) for l in levels if l > 0 for m in manip_types
]

results = Parallel(n_jobs=-1, verbose=1)(
    delayed(_run_one)(lvl, mt, sample_1, models)
    for lvl, mt in combos
)

# Expand level=0 result across all manip_types
flat = []
for (lvl, mt), batch in zip(combos, results):
    if lvl == 0:
        for mtype in manip_types:
            flat.extend([{**r, "type": mtype} for r in batch])
    else:
        flat.extend(batch)

power_df = pd.DataFrame(flat)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  16 out of  16 | elapsed:  2.0min finished

27.4.2 Power Functions

Figure 27.1 plots the estimated power functions. The key questions are: (i) which model has the highest power for a given manipulation level and type, and (ii) at what magnitudes does manipulation become reliably detectable?

fig, axes = plt.subplots(1, 3, figsize=(14, 5), sharey=True)

model_colors = {
    "healy": "#2196F3",
    "deangelo": "#FF9800",
    "jones": "#4CAF50",
    "mod_jones": "#E91E63",
    "industry": "#9C27B0",
}

for idx, mtype in enumerate(manip_types):
    ax = axes[idx]
    subset = power_df[power_df["type"] == mtype]
    for model_name in models:
        m_data = subset[subset["model"] == model_name].sort_values("level")
        ax.plot(
            m_data["level"] * 100, m_data["power"],
            marker="o", markersize=4,
            label=model_name.replace("_", " ").title(),
            color=model_colors[model_name],
            linewidth=1.5,
        )
    ax.set_title(mtype.title(), fontweight="bold")
    ax.set_xlabel("Manipulation (% of assets)")
    ax.set_ylim(-0.02, 1.02)
    ax.yaxis.set_major_formatter(mticker.PercentFormatter(xmax=1))
    ax.axhline(0.05, color="grey", linestyle="--", linewidth=0.7, label="5% size")

axes[0].set_ylabel("Rejection Rate (Power)")
axes[2].legend(fontsize=8, loc="center left", bbox_to_anchor=(1.02, 0.5))
plt.tight_layout()
plt.show()

Figure 27.1: Power functions for the five earnings management detection models across three manipulation types. Power increases with manipulation magnitude but remains low for economically plausible levels (< 10% of assets), highlighting the fundamental difficulty of detecting earnings management.

27.4.3 Summary Statistics

Table 27.6 reports power at selected manipulation levels for the Jones and Modified Jones models, which are most commonly used in applied research.

power_summary = (
    power_df[power_df["model"].isin(["jones", "mod_jones"])]
    .pivot_table(index=["model", "level"], columns="type", values="power")
    .round(3)
)
power_summary.index.names = ["Model", "Level (% assets)"]
power_summary

Table 27.6: Power of the Jones and Modified Jones models at selected manipulation levels. Even at 10% of assets—a large amount of manipulation—detection rates remain well below 50% for most manipulation types, underscoring the low statistical power of standard tests.

	type	expense	margin	revenue
Model	Level (% assets)
jones	0.00	0.100	0.100	0.100
	0.05	0.154	0.132	0.142
	0.10	0.218	0.126	0.178
	0.20	0.432	0.122	0.300
	0.30	0.660	0.122	0.436
	0.50	0.912	0.122	0.710
mod_jones	0.00	0.090	0.090	0.090
	0.05	0.130	0.106	0.128
	0.10	0.198	0.174	0.190
	0.20	0.404	0.368	0.422
	0.30	0.636	0.598	0.652
	0.50	0.884	0.878	0.922

Implications for Vietnamese Research

The power analysis has stark implications for earnings management research in Vietnam. The typical Vietnamese listed firm has been listed for 10–15 years, providing far fewer estimation-period observations than in the U.S. context where Dechow, Sloan, and Sweeney (1995) had decades of Compustat data. Shorter estimation windows increase parameter uncertainty in the Jones model, further reducing power. Combined with the noisier financial data common in emerging markets, researchers should interpret non-rejection of the null as uninformative rather than as evidence of clean financial reporting.

27.5 Vietnamese Institutional Context

27.5.1 Channels of Earnings Management

Several features of the Vietnamese institutional environment create distinctive earnings management incentives and opportunities:

Tax-driven manipulation. Vietnamese corporate income tax (CIT) rates have declined from 28% (pre-2009) to 20% (2016 onwards), with preferential rates for firms in Special Economic Zones and high-tech sectors. The close alignment between VAS accounting and tax accounting creates incentives to manage earnings downward to reduce tax obligations—a pattern documented in developing economies with code-law accounting traditions (Ball, Robin, and Wu 2003).

IPO and seasoned equity offering (SEO) incentives. Vietnam has experienced several waves of SOE equitization (partial privatization). Managers have incentives to inflate earnings before share offerings to maximize proceeds. The SSC requires minimum profitability thresholds for listing eligibility, creating sharp incentives around these regulatory cutoffs, which is a natural setting for the discontinuity analysis of Burgstahler and Dichev (1997).

Real earnings management in manufacturing. Roychowdhury (2006) identifies overproduction, discretionary expenditure cuts, and sales manipulation as the three main channels of real earnings management. Vietnam’s large manufacturing sector (textiles, electronics assembly, food processing) provides ample scope for overproduction-based REM, where unit costs are reduced by spreading fixed overhead across larger production runs.

Related-party transactions. Transactions with affiliated entities are a well-documented channel for earnings manipulation in Asian markets. Vietnamese conglomerates (tập đoàn) often feature complex cross-ownership structures where transfer pricing between subsidiaries can shift profits across reporting entities.

27.5.2 The Earnings Distribution Test

Burgstahler and Dichev (1997) observed a striking discontinuity in the distribution of reported earnings around zero: far more firms report small positive earnings than small losses, relative to what a smooth distribution would predict. This pattern is interpreted as evidence that firms manage earnings to avoid reporting losses.

We apply this test to Vietnamese-style data to illustrate the methodology.

# Generate earnings for a larger sample with benchmark-beating behavior
rng_dist = np.random.default_rng(2024)
n_firms_dist = 2000
n_years_dist = 10

earnings = []
for i in range(n_firms_dist):
    base_earn = rng_dist.normal(0.06, 0.08)
    for t in range(n_years_dist):
        e = base_earn + rng_dist.normal(0, 0.04)
        # Simulate benchmark-beating: firms near zero bump earnings up
        if -0.01 < e < 0.005:
            e += rng_dist.uniform(0.005, 0.015) * (rng_dist.random() < 0.6)
        earnings.append(e)

earn_array = np.array(earnings)

fig, ax = plt.subplots(figsize=(8, 5))
bins = np.arange(-0.30, 0.35, 0.01)
ax.hist(earn_array, bins=bins, color="steelblue", edgecolor="white",
        alpha=0.85, density=True)
ax.axvline(0, color="firebrick", linestyle="--", linewidth=1.5, label="Zero threshold")
ax.set_xlabel("Earnings / Total Assets")
ax.set_ylabel("Density")
ax.set_title("Earnings Distribution Around Zero")
ax.legend()
plt.tight_layout()
plt.show()

Figure 27.2: Distribution of scaled earnings around zero. A discontinuity—excess density just above zero and a deficit just below—would suggest benchmark-beating behavior. The red dashed line marks the zero threshold.

# Year-over-year changes
earn_changes = np.diff(earn_array.reshape(n_firms_dist, n_years_dist), axis=1).flatten()

fig, ax = plt.subplots(figsize=(8, 5))
bins_chg = np.arange(-0.15, 0.15, 0.005)
ax.hist(earn_changes, bins=bins_chg, color="darkorange", edgecolor="white",
        alpha=0.85, density=True)
ax.axvline(0, color="firebrick", linestyle="--", linewidth=1.5, label="Zero change")
ax.set_xlabel("Change in Earnings / Total Assets")
ax.set_ylabel("Density")
ax.set_title("Earnings Change Distribution Around Zero")
ax.legend()
plt.tight_layout()
plt.show()

Figure 27.3: Distribution of year-over-year earnings changes around zero. A discontinuity here suggests that firms manage earnings to avoid reporting declines, even when small.

27.5.3 Accrual-Based vs. Real Earnings Management

Cohen, Dey, and Lys (2008) document a shift from AEM to REM following the passage of the Sarbanes-Oxley Act (SOX) in 2002, suggesting that tighter regulatory scrutiny redirects manipulation toward less detectable channels. Zang (2012) provides further evidence that firms trade off between AEM and REM based on their relative costs.

In Vietnam, where regulatory enforcement of accounting standards is weaker than in post-SOX America, we might expect AEM to remain the dominant channel. However, as Vietnam moves toward IFRS adoption and strengthens SSC oversight, the AEM-to-REM substitution hypothesis becomes testable.

We can measure REM using the Roychowdhury (2006) approach. Three proxies capture different manipulation channels:

Abnormal cash flow from operations. Estimate normal CFO as a function of sales and sales changes:

\[ \frac{CFO_{i,t}}{A_{i,t-1}} = \beta_0 + \beta_1 \frac{1}{A_{i,t-1}} + \beta_2 \frac{S_{i,t}}{A_{i,t-1}} + \beta_3 \frac{\Delta S_{i,t}}{A_{i,t-1}} + \varepsilon_{i,t} \tag{27.9}\]

Abnormal production costs. Production costs = COGS + change in inventory. Normal levels are modeled as:

\[ \frac{PROD_{i,t}}{A_{i,t-1}} = \beta_0 + \beta_1 \frac{1}{A_{i,t-1}} + \beta_2 \frac{S_{i,t}}{A_{i,t-1}} + \beta_3 \frac{\Delta S_{i,t}}{A_{i,t-1}} + \beta_4 \frac{\Delta S_{i,t-1}}{A_{i,t-1}} + \varepsilon_{i,t} \tag{27.10}\]

Abnormal discretionary expenditures. R&D + advertising + SGA, modeled as:

\[ \frac{DISC_{i,t}}{A_{i,t-1}} = \beta_0 + \beta_1 \frac{1}{A_{i,t-1}} + \beta_2 \frac{S_{i,t-1}}{A_{i,t-1}} + \varepsilon_{i,t} \tag{27.11}\]

Residuals from these regressions serve as proxies for real manipulation. A firm that overproduces will show abnormally high production costs and abnormally low CFO (cash tied up in inventory).

27.6 Summary

This chapter examined the measurement and detection of earnings management, with a focus on methodological rigor and adaptation to Vietnam’s institutional environment. The key takeaways are:

Detecting earnings management requires separating discretionary from non-discretionary accruals, which depends critically on the quality of the non-discretionary accrual model. All five canonical models have known weaknesses.
Under the null hypothesis of no manipulation, the Jones and Modified Jones models over-reject when standard two-stage standard errors are used. The Salkever (1976) correction provides properly calibrated inference.
When test firms experience extreme financial performance, all models exhibit severe size distortion, misattributing performance-driven accrual variation to managerial discretion. Performance matching (Kothari, Leone, and Wasley 2005) partially addresses this.
Power analysis reveals that economically plausible levels of manipulation (below 10% of assets) are detected with very low probability. This casts doubt on studies that report null results as evidence of no manipulation.
Vietnam’s institutional features, such aSOE governance, VAS accounting rules, weak enforcement, and thin audit coverage—create a rich setting for earnings management research, but the methodological challenges are amplified by shorter time series and noisier data.

Researchers working with Vietnamese data should: (i) use the Salkever correction for proper inference, (ii) implement performance matching, (iii) consider multiple models and triangulate results, and (iv) supplement accrual-based approaches with real earnings management and distributional tests.

Ball, Ray, Ashok Robin, and Joanna Shuang Wu. 2003. “Incentives Versus Standards: Properties of Accounting Income in Four East Asian Countries.” Journal of Accounting and Economics 36 (1-3): 235–70.

Burgstahler, David, and Ilia Dichev. 1997. “Earnings Management to Avoid Earnings Decreases and Losses.” Journal of Accounting and Economics 24 (1): 99–126.

Cohen, Daniel A, Aiyesha Dey, and Thomas Z Lys. 2008. “Real and Accrual-Based Earnings Management in the Pre-and Post-Sarbanes-Oxley Periods.” The Accounting Review 83 (3): 757–87.

DeAngelo, Linda Elizabeth. 1986. “Accounting Numbers as Market Valuation Substitutes: A Study of Management Buyouts of Public Stockholders.” Accounting Review, 400–420.

Dechow, Patricia M, Richard G Sloan, and Amy P Sweeney. 1995. “Detecting Earnings Management.” Accounting Review, 193–225.

Healy, Paul M. 1985. “The Effect of Bonus Schemes on Accounting Decisions.” Journal of Accounting and Economics 7 (1-3): 85–107.

Jones, Jennifer J. 1991. “Earnings Management During Import Relief Investigations.” Journal of Accounting Research 29 (2): 193–228.

Kothari, Sagar P, Andrew J Leone, and Charles E Wasley. 2005. “Performance Matched Discretionary Accrual Measures.” Journal of Accounting and Economics 39 (1): 163–97.

Leuz, Christian, Dhananjay Nanda, and Peter D Wysocki. 2003. “Earnings Management and Investor Protection: An International Comparison.” Journal of Financial Economics 69 (3): 505–27.

Roychowdhury, Sugata. 2006. “Earnings Management Through Real Activities Manipulation.” Journal of Accounting and Economics 42 (3): 335–70.

Salkever, David S. 1976. “The Use of Dummy Variables to Compute Predictions, Prediction Errors, and Confidence Intervals.” Journal of Econometrics 4 (4): 393–97.

Zang, Amy Y. 2012. “Evidence on the Trade-Off Between Real Activities Manipulation and Accrual-Based Earnings Management.” The Accounting Review 87 (2): 675–703.

# Earnings Management: Detection and Measurement ```{python} #| label: setup #| message: false import pandas as pd import numpy as np import matplotlib.pyplot as plt import matplotlib.ticker as mticker import statsmodels.api as sm import statsmodels.formula.api as smf from scipy import stats from typing import Optional import warnings warnings.filterwarnings("ignore") plt.rcParams.update({ "figure.dpi": 150, "axes.spines.top": False, "axes.spines.right": False, "font.size": 11, }) ``` Earnings management refers to the purposeful intervention by managers in the financial reporting process to achieve outcomes that serve private objectives, whether to meet analyst forecasts, trigger bonus thresholds, avoid covenant violations, or influence equity valuations. While the concept is intuitive, its rigorous detection poses one of the most enduring methodological challenges in empirical accounting research. The difficulty is fundamental: researchers observe *total* accruals, which combine a legitimate component (reflecting genuine economic activity) with a discretionary component (reflecting managerial intervention). Separating these two components requires a model of what accruals *should* be absent manipulation, what the literature calls **non-discretionary accruals (NDA)**. Any residual, the gap between observed accruals and the model's prediction, is then attributed to managerial discretion. The quality of the detection method therefore hinges entirely on the quality of this model. This chapter examines three foundational approaches to measuring earnings management (i.e., the @healy1985effect model, the @jones1991earnings model, and the modified Jones model of @dechow1995detecting) and evaluates their statistical properties using simulation. We then adapt the analysis to the institutional setting of Vietnam, where distinct governance structures, accounting standards, and enforcement regimes create both unique incentives for earnings management and unique challenges for its detection. ### Defining Earnings Management A useful taxonomy distinguishes three forms: 1. **Accrual-based earnings management (AEM).** Managers exploit discretion within accounting standards to shift the timing of revenue or expense recognition. Examples include aggressive revenue recognition, delays in write-downs, or manipulation of allowance estimates. This form does not alter the firm's underlying cash flows. 2. **Real earnings management (REM).** Managers take genuine economic actions (e.g., overproduction to reduce unit costs, cutting R&D or advertising expenditure, offering price discounts to accelerate sales) that have real cash flow consequences [@roychowdhury2006earnings]. These actions may improve current-period earnings at the expense of future value. 3. **Classification shifting.** Managers reclassify core expenses as non-recurring items to inflate core earnings without changing bottom-line net income. This form leaves both total accruals and cash flows unchanged. ### Why Vietnam? Vietnam's institutional environment amplifies several channels relevant to earnings management research: - **State-owned enterprise (SOE) incentives.** Partially privatized SOEs face dual pressures: political targets from state shareholders and market expectations from minority investors. - **Regulatory enforcement.** Vietnam's State Securities Commission (SSC) has limited resources and a short institutional history relative to bodies like the U.S. SEC. Weaker enforcement reduces the expected cost of manipulation, potentially increasing its prevalence [@leuz2003earnings]. - **Accounting standards.** VAS (Vietnamese Accounting Standards) are based on older IAS versions and have not fully converged with IFRS. Certain VAS provisions, such as rules-based revenue recognition criteria and prescribed depreciation methods, constrain some forms of discretion while creating predictable opportunities for others. - **Benchmark-beating behavior.** @burgstahler1997earnings documented discontinuities around zero earnings and zero earnings changes in U.S. data. The Vietnamese market, with its high retail participation and emphasis on headline profitability, may exhibit similar or even more pronounced patterns. ## Models of Non-Discretionary Accruals {#sec-models} ### Notation and Setup Let $i$ index firms and $t$ index fiscal years. Define: | Symbol | Definition | |-------------------------------|-----------------------------------------| | $TA_{i,t}$ | Total accruals (scaled by lagged assets) | | $NDA_{i,t}$ | Non-discretionary accruals (model prediction) | | $DA_{i,t}$ | Discretionary accruals: $DA_{i,t} = TA_{i,t} - NDA_{i,t}$ | | $A_{i,t}$ | Total assets | | $\Delta Rev_{i,t}$ | Change in revenues, scaled by $A_{i,t-1}$ | | $\Delta Rec_{i,t}$ | Change in net receivables, scaled by $A_{i,t-1}$ | | $PPE_{i,t}$ | Gross property, plant, and equipment, scaled by $A_{i,t-1}$ | | $PART_{i,t}$ | Indicator equal to 1 for the test (event) year | : Notation for earnings management models {#tbl-em-notation} Total accruals are computed using the balance sheet approach: $$ TA_{i,t} = \frac{(\Delta CA_{i,t} - \Delta Cash_{i,t}) - (\Delta CL_{i,t} - \Delta STD_{i,t}) - Dep_{i,t}}{A_{i,t-1}} $$ {#eq-total-accruals} where $\Delta CA$ is the change in current assets, $\Delta Cash$ is the change in cash, $\Delta CL$ is the change in current liabilities, $\Delta STD$ is the change in short-term debt, and $Dep$ is depreciation expense. ### Five Models We implement five models, each estimating $NDA$ during a firm-specific estimation window and computing $DA$ for the test year as the residual. **Model 1**: @healy1985effect. Non-discretionary accruals equal the mean of total accruals during the estimation period: $$ NDA^{Healy}_{i,t} = \frac{1}{T} \sum_{s \in \text{est}} TA_{i,s} $$ {#eq-healy} This is the simplest possible benchmark. Its limitation is obvious: it treats *all* time-variation in accruals as discretionary, even variation driven by changes in the firm's economic environment. **Model 2**: @deangelo1986accounting. Non-discretionary accruals equal last period's total accruals: $$ NDA^{DeAngelo}_{i,t} = TA_{i,t-1} $$ {#eq-deangelo} This is equivalent to assuming that the change in total accruals is entirely discretionary. It performs well when accruals follow a random walk, but poorly when they exhibit mean-reversion or trend. **Model 3**: @jones1991earnings. This model controls for changes in a firm's economic environment by regressing accruals on revenue changes and the level of fixed assets: $$ TA_{i,t} = \alpha_1 \frac{1}{A_{i,t-1}} + \alpha_2 \Delta Rev_{i,t} + \alpha_3 PPE_{i,t} + \varepsilon_{i,t} $$ {#eq-jones} The parameters $\hat{\alpha}_1, \hat{\alpha}_2, \hat{\alpha}_3$ are estimated on the estimation-period data. Non-discretionary accruals for the test year are the fitted values from this regression applied to test-year covariates. The economic logic is that revenue growth generates legitimate working capital accruals (higher receivables and inventory), while fixed assets proxy for non-discretionary depreciation charges. The intercept is scaled by lagged assets rather than included as a conventional constant, following @jones1991earnings. **Model 4: Modified Jones** [@dechow1995detecting]. The modification adjusts revenue changes for changes in receivables during the test year, on the premise that credit revenue growth is more susceptible to manipulation than cash revenue growth: $$ NDA^{ModJones}_{i,t} = \hat{\alpha}_1 \frac{1}{A_{i,t-1}} + \hat{\alpha}_2 (\Delta Rev_{i,t} - \Delta Rec_{i,t}) + \hat{\alpha}_3 PPE_{i,t} $$ {#eq-mod-jones} The coefficients are still estimated from the *unadjusted* Jones model (@eq-jones) on estimation-period data, but receivables are subtracted from revenues only when computing fitted values for the test year. **Model 5: Industry Model.** This model assumes that the common component of accruals within an industry captures non-discretionary variation: $$ TA_{i,t} = \phi_0 + \phi_1 \cdot \widetilde{TA}_{j,t} + \eta_{i,t} $$ {#eq-industry} where $\widetilde{TA}_{j,t}$ is the median total accrual across all firms in industry $j$ (excluding firm $i$), estimated during the estimation period. ### Implementation ```{python} #| label: calc-accruals-fn #| code-summary: "Function to compute total accruals from balance sheet data" def calc_accruals(df: pd.DataFrame) -> pd.DataFrame: """ Compute total accruals using the balance sheet approach. Expected columns: gvkey/ticker, year/fyear, at, act, che, lct, dlc, dp, sale, rect, ppegt """ firm_col = "ticker" if "ticker" in df.columns else "gvkey" year_col = "year" if "year" in df.columns else "fyear" df = df.sort_values([firm_col, year_col]).copy() # Lagged values and changes g = df.groupby(firm_col) df["lag_at"] = g["at"].shift(1) df["d_ca"] = g["act"].diff() df["d_cash"] = g["che"].diff() df["d_cl"] = g["lct"].diff() df["d_std"] = g["dlc"].diff() df["d_rev"] = g["sale"].diff() df["d_rec"] = g["rect"].diff() # Total accruals (raw) df["acc_raw"] = (df["d_ca"] - df["d_cash"] - df["d_cl"] + df["d_std"]) - df["dp"] return df ``` ```{python} #| label: nda-models #| code-summary: "Implementations of five non-discretionary accrual models" def fit_healy(df: pd.DataFrame) -> pd.DataFrame: """Healy (1985): NDA = mean accruals in estimation period.""" est_mean = df.loc[~df["part"], "acc_at"].mean() df = df.copy() df["nda_healy"] = est_mean df["da_healy"] = df["acc_at"] - df["nda_healy"] return df def fit_deangelo(df: pd.DataFrame) -> pd.DataFrame: """DeAngelo (1986): NDA = prior-period total accruals.""" df = df.copy() df["nda_deangelo"] = df["acc_at"].shift(1) df["da_deangelo"] = df["acc_at"] - df["nda_deangelo"] return df def fit_jones(df: pd.DataFrame) -> pd.DataFrame: """Jones (1991): Regression-based NDA controlling for revenue and PPE.""" df = df.copy() est = df[~df["part"]].dropna(subset=["acc_at", "one_at", "d_rev_at", "ppe_at"]) if len(est) < 5: df["nda_jones"] = np.nan df["da_jones"] = np.nan return df y = est["acc_at"] X = est[["one_at", "d_rev_at", "ppe_at"]] model = sm.OLS(y, X).fit() pred_X = df[["one_at", "d_rev_at", "ppe_at"]].copy() df["nda_jones"] = model.predict(pred_X) df["da_jones"] = df["acc_at"] - df["nda_jones"] return df def fit_mod_jones(df: pd.DataFrame) -> pd.DataFrame: """Modified Jones (Dechow et al., 1995): Adjust revenues for receivables.""" df = df.copy() est = df[~df["part"]].dropna(subset=["acc_at", "one_at", "d_rev_at", "ppe_at"]) if len(est) < 5: df["nda_mod_jones"] = np.nan df["da_mod_jones"] = np.nan return df # Estimate on unadjusted Jones model y = est["acc_at"] X = est[["one_at", "d_rev_at", "ppe_at"]] model = sm.OLS(y, X).fit() # Predict using adjusted revenue (subtract receivable changes) pred_X = df[["one_at", "d_rev_alt_at", "ppe_at"]].copy() pred_X.columns = ["one_at", "d_rev_at", "ppe_at"] df["nda_mod_jones"] = model.predict(pred_X) df["da_mod_jones"] = df["acc_at"] - df["nda_mod_jones"] return df def fit_industry(df: pd.DataFrame) -> pd.DataFrame: """Industry model: NDA = f(industry median accruals).""" df = df.copy() est = df[~df["part"]].dropna(subset=["acc_at", "acc_ind"]) if len(est) < 5: df["nda_industry"] = np.nan df["da_industry"] = np.nan return df y = est["acc_at"] X = sm.add_constant(est["acc_ind"]) model = sm.OLS(y, X).fit() pred_X = sm.add_constant(df["acc_ind"]) df["nda_industry"] = model.predict(pred_X) df["da_industry"] = df["acc_at"] - df["nda_industry"] return df ``` ```{python} #| label: get-nda-all #| code-summary: "Master function to apply all five models to a panel" def prepare_model_vars(df: pd.DataFrame) -> pd.DataFrame: """Add scaled variables needed by the five NDA models.""" df = calc_accruals(df) firm_col = "ticker" if "ticker" in df.columns else "gvkey" df["sic2"] = df["sic"].astype(str).str[:2] df["acc_at"] = df["acc_raw"] / df["lag_at"] df["one_at"] = 1.0 / df["lag_at"] df["d_rev_at"] = df["d_rev"] / df["lag_at"] df["d_rev_alt_at"] = (df["d_rev"] - df["d_rec"]) / df["lag_at"] df["ppe_at"] = df["ppegt"] / df["lag_at"] # Industry median accruals (estimation period only) # est_acc = df.loc[~df["part"], ["sic2", "acc_at"]].copy() est_acc = df.loc[df["part"] == False, ["sic2", "acc_at"]].copy() ind_median = est_acc.groupby("sic2")["acc_at"].median().rename("acc_ind") df = df.merge(ind_median, on="sic2", how="left") return df def get_all_nda(df: pd.DataFrame) -> pd.DataFrame: """ Apply all five NDA models on a firm-by-firm basis. Returns a DataFrame with DA columns for each model. """ firm_col = "ticker" if "ticker" in df.columns else "gvkey" year_col = "year" if "year" in df.columns else "fyear" df_mod = prepare_model_vars(df) df_mod["part"] = df_mod["part"].astype(bool) results = [] for firm, group in df_mod.groupby(firm_col): g = group.sort_values(year_col).copy() g = fit_healy(g) g = fit_deangelo(g) g = fit_jones(g) g = fit_mod_jones(g) g = fit_industry(g) results.append(g) return pd.concat(results, ignore_index=True) ``` ### The Salkever (1976) Correction An important but underappreciated methodological issue arises when computing standard errors for discretionary accruals under the Jones-type models. In the standard two-stage procedure, the researcher first estimates the Jones model on the estimation period, then computes discretionary accruals for the test year as a *prediction error*. But the standard error of a prediction error is larger than the standard error of a fitted residual, because it incorporates parameter uncertainty from the first-stage estimation. Ignoring this distinction leads to understated standard errors and inflated rejection rates,exactly the problem documented in @dechow1995detecting. @salkever1976use provides an elegant solution: run a single regression on the *combined* estimation and test periods, including a dummy variable $PART$ for the test year. The coefficient on $PART$ equals the prediction error (discretionary accruals), and its standard error correctly accounts for first-stage estimation uncertainty. For the Jones model, the Salkever single-stage regression is: $$ TA_{i,t} = \alpha_1 \frac{1}{A_{i,t-1}} + \alpha_2 \Delta Rev_{i,t} + \alpha_3 PPE_{i,t} + \delta \cdot PART_{i,t} + u_{i,t} $$ {#eq-salkever} The coefficient $\hat{\delta}$ is numerically identical to the two-stage $DA$ estimate, but its standard error $\text{se}(\hat{\delta})$ is the correct prediction error standard error. ```{python} #| label: salkever-demo #| code-summary: "Demonstration of the Salkever (1976) equivalence" def demonstrate_salkever(df_firm: pd.DataFrame) -> pd.DataFrame: """ For a single firm, show that the two-stage Jones DA equals the Salkever one-stage coefficient on PART. """ df = prepare_model_vars(df_firm) needed = ["acc_at", "one_at", "d_rev_at", "ppe_at", "part"] df = df.dropna(subset=needed).copy() # ── Two-stage approach ── est = df[~df["part"]] y_est = est["acc_at"] X_est = est[["one_at", "d_rev_at", "ppe_at"]] fm_stage1 = sm.OLS(y_est, X_est).fit() df["nda_two_stage"] = fm_stage1.predict(df[["one_at", "d_rev_at", "ppe_at"]]) df["da_two_stage"] = df["acc_at"] - df["nda_two_stage"] # Test-year DA from two-stage da_two_stage = df.loc[df["part"], "da_two_stage"].values # ── Salkever one-stage ── df["part_float"] = df["part"].astype(float) y_full = df["acc_at"] X_full = df[["one_at", "d_rev_at", "ppe_at", "part_float"]] fm_salkever = sm.OLS(y_full, X_full).fit() da_salkever = fm_salkever.params["part_float"] se_two_stage_wrong = np.nan # two-stage doesn't give correct SE se_salkever = fm_salkever.bse["part_float"] return pd.DataFrame({ "Method": ["Two-stage Jones", "Salkever one-stage"], "DA estimate": [da_two_stage[0] if len(da_two_stage) else np.nan, da_salkever], "Correct SE": ["Not available", f"{se_salkever:.6f}"], "t-statistic": ["Biased", f"{fm_salkever.tvalues['part_float']:.4f}"], }) ``` ## Type I Error Under the Null Hypothesis {#sec-type1} ### Experimental Design To evaluate whether the five models produce well-calibrated test statistics, we conduct a simulation experiment parallel to Table 2 of @dechow1995detecting. The procedure is: 1. Generate a panel of $N$ firms, each with $T$ years of financial statement data. 2. For each firm, randomly designate one year as the test year ($PART = 1$). By construction, *no earnings management occurs* in this year. 3. Estimate discretionary accruals using each of the five models. 4. Regress $DA$ on $PART$ for each firm and record whether the null $H_0: \delta = 0$ is rejected at the 5% and 1% significance levels. 5. Compute the rejection rate across all $N$ firms. If the model is well-specified, rejection rates should equal the nominal test size (5% or 1%). Systematic over-rejection indicates that the model produces biased test statistics, a critical flaw for research that relies on these measures to draw causal inferences. ### Data Generation We generate synthetic panel data that preserves the key cross-sectional and time-series properties of Vietnamese listed firms while allowing us to know with certainty that no manipulation exists. ```{python} #| label: gen-panel #| code-summary: "Generate synthetic Vietnamese panel data for Type I error analysis" def generate_em_panel( n_firms: int = 500, n_years: int = 15, seed: int = 2024, ) -> pd.DataFrame: """ Generate a synthetic panel of Vietnamese-style financial data. No earnings management is present by construction. The data generation process captures: - AR(1) revenue process - Accruals driven by revenue growth and PPE levels - Industry-level common shocks - SOE/non-SOE heterogeneity """ rng = np.random.default_rng(seed) industries = [10, 20, 25, 41, 46, 47, 52, 62, 64, 68] records = [] for i in range(n_firms): # Firm characteristics sic = rng.choice(industries) is_soe = int(rng.random() < 0.30) base_assets = rng.lognormal(mean=12, sigma=1.5) # VND billions growth_rate = rng.normal(0.08, 0.04) # Jones model parameters (firm-specific true DGP) true_alpha1 = rng.normal(0, 0.02) true_alpha2 = rng.normal(0.06, 0.03) # revenue-accrual sensitivity true_alpha3 = rng.normal(-0.04, 0.02) # depreciation effect at_prev = base_assets sale_prev = base_assets * rng.uniform(0.5, 1.5) rect_prev = sale_prev * rng.uniform(0.05, 0.25) for t in range(n_years): year = 2009 + t # Evolve fundamentals at = at_prev * (1 + growth_rate + rng.normal(0, 0.05)) sale = sale_prev * (1 + rng.normal(0.06, 0.08)) rect = sale * rng.uniform(0.05, 0.25) ppegt = at * rng.uniform(0.3, 0.7) # Generate accruals from true Jones DGP + noise d_rev = sale - sale_prev one_at = 1.0 / at_prev d_rev_at = d_rev / at_prev ppe_at = ppegt / at_prev acc_at = (true_alpha1 * one_at + true_alpha2 * d_rev_at + true_alpha3 * ppe_at + rng.normal(0, 0.03)) acc_raw = acc_at * at_prev # Reverse-engineer balance sheet items consistent with accruals dp = ppegt * rng.uniform(0.05, 0.12) d_cl = rng.normal(0, at * 0.02) d_std = rng.normal(0, at * 0.01) d_cash = rng.normal(0, at * 0.02) d_ca = acc_raw + dp + d_cl - d_std + d_cash act = at * rng.uniform(0.3, 0.6) che = act * rng.uniform(0.05, 0.2) lct = at * rng.uniform(0.15, 0.35) dlc = lct * rng.uniform(0.1, 0.4) ni = sale * rng.uniform(0.03, 0.12) ib = ni records.append({ "ticker": f"VN{i:04d}", "fyear": year, "at": at, "act": act, "che": che, "lct": lct, "dlc": dlc, "dp": dp, "sale": sale, "rect": rect, "ppegt": ppegt, "ni": ni, "ib": ib, "sic": sic, "is_soe": is_soe, }) at_prev = at sale_prev = sale rect_prev = rect df = pd.DataFrame(records) return df panel_raw = generate_em_panel(n_firms=500, n_years=15, seed=2024) print(f"Panel: {panel_raw.shape[0]:,} firm-years, " f"{panel_raw['ticker'].nunique()} firms") ``` ### Sample Construction We construct a sample of 500 firms, each with a randomly designated test year, mirroring the design of @dechow1995detecting. ```{python} #| label: sample-construction #| code-summary: "Construct analysis sample with random test-year assignment" def construct_sample( df: pd.DataFrame, n_sample: int = 500, min_est_years: int = 10, seed: int = 42, selection_filter: Optional[callable] = None, ) -> pd.DataFrame: """ For each firm, randomly assign one year as the test year (part=True). Require at least min_est_years of estimation data. """ rng = np.random.default_rng(seed) firm_col = "ticker" year_col = "fyear" # Compute accruals and filter for data availability df = calc_accruals(df) required = ["acc_raw", "lag_at", "d_rev", "d_rec", "ppegt"] df = df.dropna(subset=required) df = df[df["lag_at"] > 0] # Require minimum years firm_counts = df.groupby(firm_col)[year_col].count() eligible = firm_counts[firm_counts >= (min_est_years + 1)].index df = df[df[firm_col].isin(eligible)] if selection_filter is not None: df = selection_filter(df) # Sample n_sample firms firms = df[firm_col].unique() if len(firms) > n_sample: firms = rng.choice(firms, n_sample, replace=False) df = df[df[firm_col].isin(firms)].copy() # For each firm, randomly pick one test year (not the first year) parts = [] for firm, group in df.groupby(firm_col): years = group[year_col].sort_values().values if len(years) < 2: continue test_year = rng.choice(years[1:]) parts.append({firm_col: firm, year_col: test_year, "part": True}) part_df = pd.DataFrame(parts) df = df.merge(part_df, on=[firm_col, year_col], how="left") df["part"] = df["part"].fillna(False) df["part"] = df["part"].astype(bool) return df sample_1 = construct_sample(panel_raw, n_sample=500, seed=2024) print(f"Sample 1: {sample_1.shape[0]:,} firm-years, " f"{sample_1['ticker'].nunique()} firms, " f"{sample_1['part'].sum()} test years") ``` ### Estimating Discretionary Accruals ```{python} #| label: estimate-da #| code-summary: "Apply all five models to Sample 1" da_results = get_all_nda(sample_1) # Verify: peek at test-year DA across models da_cols = ["da_healy", "da_deangelo", "da_jones", "da_mod_jones", "da_industry"] test_da = da_results[da_results["part"]][da_cols] print("Test-year discretionary accruals (first 5 firms):") test_da.head().round(4) ``` ### Firm-Level Regressions and Rejection Rates For each firm and each model, we regress $DA$ on $PART$ and record whether the null hypothesis of zero discretionary accruals in the test year is rejected. ```{python} #| label: type1-regressions #| code-summary: "Compute firm-specific t-statistics and rejection rates" def firm_regressions(df: pd.DataFrame, models: list[str]) -> pd.DataFrame: """ For each firm and model, regress DA on PART. Return coefficients, std errors, t-stats, and rejection indicators. """ firm_col = "ticker" if "ticker" in df.columns else "gvkey" records = [] for firm, group in df.groupby(firm_col): for model in models: da_col = f"da_{model}" g = group.dropna(subset=[da_col]).copy() g["part_float"] = g["part"].astype(float) if len(g) < 5 or g["part"].sum() == 0: continue try: fm = sm.OLS( g[da_col], sm.add_constant(g["part_float"]) ).fit() coef = fm.params["part_float"] se = fm.bse["part_float"] t_stat = fm.tvalues["part_float"] df_resid = fm.df_resid # One-sided p-values p_neg = stats.t.cdf(t_stat, df_resid) p_pos = 1 - p_neg records.append({ firm_col: firm, "model": model, "coef": coef, "se": se, "t_stat": t_stat, "neg_p01": p_neg < 0.01, "neg_p05": p_neg < 0.05, "pos_p01": p_pos < 0.01, "pos_p05": p_pos < 0.05, }) except Exception: continue return pd.DataFrame(records) models = ["healy", "deangelo", "jones", "mod_jones", "industry"] reg_results = firm_regressions(da_results, models) ``` ### Results @tbl-type1-coefs reports the distribution of estimated coefficients on $PART$ across firms. Under the null of no manipulation, we expect the mean coefficient to be approximately zero. ```{python} #| label: tbl-type1-coefs #| tbl-cap: "Distribution of firm-level discretionary accrual estimates (coefficient on PART) under the null hypothesis of no earnings management. All five models produce mean estimates near zero, as expected." coef_stats = ( reg_results .groupby("model")["coef"] .agg(["mean", "std", lambda x: x.quantile(0.25), "median", lambda x: x.quantile(0.75)]) ) coef_stats.columns = ["Mean", "Std Dev", "Q1", "Median", "Q3"] coef_stats.index.name = "Model" coef_stats.round(4) ``` @tbl-type1-rejection reports rejection rates. The critical comparison is whether these rates approximate the nominal test size. ```{python} #| label: tbl-type1-rejection #| tbl-cap: "Type I error rates for one-sided tests of earnings management under the null hypothesis. Rates exceeding the nominal size (5% or 1%) indicate that the model over-rejects—a significant concern for the Jones and Modified Jones models." rejection_rates = ( reg_results .groupby("model")[["neg_p01", "neg_p05", "pos_p01", "pos_p05"]] .mean() .round(4) ) rejection_rates.columns = [ "Neg (1%)", "Neg (5%)", "Pos (1%)", "Pos (5%)" ] rejection_rates.index.name = "Model" rejection_rates ``` ### Binomial Test for Size Distortion We formally test whether observed rejection rates differ from nominal sizes using a two-sided binomial test. Small $p$-values indicate significant mis-calibration of the test statistic. ```{python} #| label: tbl-binom-test #| tbl-cap: "Binomial test p-values for whether rejection rates equal nominal test sizes. Small values (e.g., < 0.05) indicate statistically significant size distortion." def binom_test_rate(series: pd.Series, nominal: float) -> float: """Two-sided binomial test for rejection rate = nominal.""" x = series.dropna() k = int(x.sum()) n = len(x) if n == 0: return np.nan return stats.binomtest(k, n, nominal, alternative="two-sided").pvalue binom_results = {} for model, group in reg_results.groupby("model"): binom_results[model] = { "Neg (1%)": binom_test_rate(group["neg_p01"], 0.01), "Neg (5%)": binom_test_rate(group["neg_p05"], 0.05), "Pos (1%)": binom_test_rate(group["pos_p01"], 0.01), "Pos (5%)": binom_test_rate(group["pos_p05"], 0.05), } binom_df = pd.DataFrame(binom_results).T.round(4) binom_df.index.name = "Model" binom_df ``` ::: callout-warning ## Interpreting Over-Rejection If the Jones and Modified Jones models show rejection rates significantly above 5%, this signals that the standard two-stage procedure produces anti-conservative test statistics. The @salkever1976use correction addresses this by computing standard errors that reflect first-stage estimation uncertainty. In practical research on Vietnamese firms, where sample sizes per firm are often short (10–15 years of listed history), this correction is especially important because prediction error variance is a larger fraction of residual variance with fewer estimation-period observations. ::: ## Extreme Performance Firms {#sec-extreme} A well-known weakness of accrual-based models is their poor performance when test firms experience extreme economic performance. @dechow1995detecting documented that all five models over-reject the null hypothesis when test firm-years are drawn from the tails of the earnings or cash flow distribution. @kothari2005performance subsequently proposed "performance matching" as a partial remedy. The intuition for the problem is straightforward: the Jones model assumes a linear, symmetric relationship between revenue changes and accruals. But firms experiencing extreme growth or contraction generate accruals that deviate nonlinearly from the model's predictions, even absent any manipulation. This nonlinearity is misattributed to discretionary accruals. ### Constructing Extreme-Performance Samples ```{python} #| label: extreme-sample #| code-summary: "Construct samples from top and bottom earnings deciles" def add_earnings_deciles(df: pd.DataFrame) -> pd.DataFrame: """Compute firm-year earnings (scaled) and assign to deciles.""" firm_col = "ticker" if "ticker" in df.columns else "gvkey" df = df.copy() g = df.groupby(firm_col) df["lag_at_earn"] = g["at"].shift(1) df["earn"] = df["ib"] / df["lag_at_earn"] df["earn_decile"] = pd.qcut( df["earn"], 10, labels=False, duplicates="drop" ) + 1 return df panel_with_earn = add_earnings_deciles(panel_raw) # High-earners sample (top decile) def filter_high_earn(df): df = add_earnings_deciles(df) return df[df["earn_decile"] == 10] # Low-earners sample (bottom decile) def filter_low_earn(df): df = add_earnings_deciles(df) return df[df["earn_decile"] == 1] sample_high = construct_sample( panel_raw, n_sample=300, seed=100, selection_filter=filter_high_earn ) sample_low = construct_sample( panel_raw, n_sample=300, seed=200, selection_filter=filter_low_earn ) print(f"High-earnings sample: {sample_high['ticker'].nunique()} firms") print(f"Low-earnings sample: {sample_low['ticker'].nunique()} firms") ``` ```{python} #| label: tbl-extreme-rejection #| tbl-cap: "Type I error rates (5% one-sided) when test firm-years are drawn from extreme earnings deciles. Over-rejection is expected because accrual models misattribute performance-driven accrual variation to managerial discretion." extreme_results = {} for label, sample_df in [("High earners", sample_high), ("Low earners", sample_low)]: da = get_all_nda(sample_df) regs = firm_regressions(da, models) rates = regs.groupby("model")[["neg_p05", "pos_p05"]].mean().round(4) rates.columns = [f"{label} Neg(5%)", f"{label} Pos(5%)"] extreme_results[label] = rates if extreme_results: extreme_df = pd.concat(extreme_results.values(), axis=1) extreme_df.index.name = "Model" extreme_df ``` ### Performance Matching @kothari2005performance propose adjusting discretionary accruals by subtracting the DA of a performance-matched firm (one in the same industry with similar ROA). This removes the systematic component of accruals correlated with performance. The matched discretionary accrual is: $$ DA^{PM}_{i,t} = DA_{i,t} - DA_{i^*,t} $$ {#eq-performance-match} where $i^*$ is the matched control firm. This approach is particularly relevant in Vietnam, where the cross-section of listed firms includes many high-growth firms alongside stagnant SOEs, which is performance heterogeneity that standard models may mischaracterize. ## Power Analysis {#sec-power} ### Artificially Introducing Earnings Management A model that never rejects is useless even if its Type I error rate is perfect. We need to evaluate **power**: the probability of detecting manipulation when it truly exists. Following @dechow1995detecting, we introduce known artificial manipulation into test-year financial statements and measure detection rates. We consider three forms of manipulation at varying magnitudes: 1. **Expense manipulation.** Decrease current liabilities by the manipulation amount (e.g., delaying recognition of accrued expenses): $$LCT'_{i,t} = LCT_{i,t} - \lambda \cdot A_{i,t-1}$$ 2. **Revenue manipulation.** Increase sales and receivables by the manipulation amount (e.g., premature revenue recognition or channel stuffing): $$Sale'_{i,t} = Sale_{i,t} + \lambda \cdot A_{i,t-1}, \quad Rect'_{i,t} = Rect_{i,t} + \lambda \cdot A_{i,t-1}$$ 3. **Margin manipulation.** Increase sales by the gross amount needed to inflate net income by $\lambda \cdot A_{i,t-1}$, increasing both receivables and current liabilities proportionally. The parameter $\lambda$ represents manipulation as a fraction of lagged total assets, ranging from 0% to 50%. ```{python} #| label: manipulate-fn #| code-summary: "Function to introduce artificial manipulation" def manipulate( df: pd.DataFrame, level: float = 0.0, manip_type: str = "expense", ) -> pd.DataFrame: """ Introduce artificial earnings management of a given type and level into test-year (part=True) observations. Parameters ---------- df : DataFrame with 'part' indicator and required financial variables. level : Manipulation as fraction of lagged total assets. manip_type : One of 'expense', 'revenue', 'margin'. """ firm_col = "ticker" if "ticker" in df.columns else "gvkey" year_col = "fyear" if "fyear" in df.columns else "year" df = df.sort_values([firm_col, year_col]).copy() g = df.groupby(firm_col) lag_at = g["at"].shift(1) manip_amt = lag_at * level if manip_type == "expense": # Decrease current liabilities in test year df.loc[df["part"], "lct"] -= manip_amt[df["part"]] elif manip_type == "revenue": # Increase sales and receivables in test year df.loc[df["part"], "sale"] += manip_amt[df["part"]] df.loc[df["part"], "rect"] += manip_amt[df["part"]] df.loc[df["part"], "act"] += manip_amt[df["part"]] # Reverse in following year next_part = g["part"].shift(1).fillna(False) df.loc[next_part, "sale"] -= manip_amt[next_part] elif manip_type == "margin": # Compute net income ratio for scaling est_mask = ~df["part"] ni_ratio = g.apply( lambda x: (x.loc[~x["part"], "ni"] / x.loc[~x["part"], "sale"]).median() ) df["_ni_ratio"] = df[firm_col].map(ni_ratio) gross_amt = np.where( df["_ni_ratio"] > 0, manip_amt / df["_ni_ratio"], 0 ) df.loc[df["part"], "sale"] += gross_amt[df["part"]] df.loc[df["part"], "rect"] += gross_amt[df["part"]] df.loc[df["part"], "act"] += gross_amt[df["part"]] net_effect = gross_amt - manip_amt df.loc[df["part"], "lct"] += net_effect[df["part"]] df.drop(columns=["_ni_ratio"], inplace=True) return df ``` ```{python} #| label: power-analysis #| eval: false #| code-summary: "Power analysis across manipulation levels and types" levels = [0.0, 0.05, 0.10, 0.20, 0.30, 0.50] manip_types = ["expense", "revenue", "margin"] power_records = [] for level in levels: for mtype in manip_types: # Copy base sample and introduce manipulation manip_sample = sample_1.copy() if level > 0: manip_sample = manipulate(manip_sample, level=level, manip_type=mtype) # Estimate DA and run tests da = get_all_nda(manip_sample) regs = firm_regressions(da, models) # Power = positive one-sided rejection rate at 5% power = regs.groupby("model")["pos_p05"].mean() for model_name, pwr in power.items(): power_records.append({ "level": level, "type": mtype, "model": model_name, "power": pwr, }) power_df = pd.DataFrame(power_records) ``` ```{python} #| label: power-analysis-parallel #| code-summary: "Power analysis across manipulation levels and types" levels = [0.0, 0.05, 0.10, 0.20, 0.30, 0.50] manip_types = ["expense", "revenue", "margin"] from joblib import Parallel, delayed from itertools import product def _run_one(level, mtype, base_sample, models): manip_sample = base_sample.copy() if level > 0: manip_sample = manipulate(manip_sample, level=level, manip_type=mtype) da = get_all_nda(manip_sample) regs = firm_regressions(da, models) power = regs.groupby("model")["pos_p05"].mean() return [ {"level": level, "type": mtype, "model": m, "power": p} for m, p in power.items() ] # Skip redundant level=0 runs (all manip_types identical when level=0) combos = [(0.0, "expense")] + [ (l, m) for l in levels if l > 0 for m in manip_types ] results = Parallel(n_jobs=-1, verbose=1)( delayed(_run_one)(lvl, mt, sample_1, models) for lvl, mt in combos ) # Expand level=0 result across all manip_types flat = [] for (lvl, mt), batch in zip(combos, results): if lvl == 0: for mtype in manip_types: flat.extend([{**r, "type": mtype} for r in batch]) else: flat.extend(batch) power_df = pd.DataFrame(flat) ``` ### Power Functions @fig-power plots the estimated power functions. The key questions are: (i) which model has the highest power for a given manipulation level and type, and (ii) at what magnitudes does manipulation become reliably detectable? ```{python} #| label: fig-power #| fig-cap: "Power functions for the five earnings management detection models across three manipulation types. Power increases with manipulation magnitude but remains low for economically plausible levels (< 10% of assets), highlighting the fundamental difficulty of detecting earnings management." fig, axes = plt.subplots(1, 3, figsize=(14, 5), sharey=True) model_colors = { "healy": "#2196F3", "deangelo": "#FF9800", "jones": "#4CAF50", "mod_jones": "#E91E63", "industry": "#9C27B0", } for idx, mtype in enumerate(manip_types): ax = axes[idx] subset = power_df[power_df["type"] == mtype] for model_name in models: m_data = subset[subset["model"] == model_name].sort_values("level") ax.plot( m_data["level"] * 100, m_data["power"], marker="o", markersize=4, label=model_name.replace("_", " ").title(), color=model_colors[model_name], linewidth=1.5, ) ax.set_title(mtype.title(), fontweight="bold") ax.set_xlabel("Manipulation (% of assets)") ax.set_ylim(-0.02, 1.02) ax.yaxis.set_major_formatter(mticker.PercentFormatter(xmax=1)) ax.axhline(0.05, color="grey", linestyle="--", linewidth=0.7, label="5% size") axes[0].set_ylabel("Rejection Rate (Power)") axes[2].legend(fontsize=8, loc="center left", bbox_to_anchor=(1.02, 0.5)) plt.tight_layout() plt.show() ``` ### Summary Statistics @tbl-power-summary reports power at selected manipulation levels for the Jones and Modified Jones models, which are most commonly used in applied research. ```{python} #| label: tbl-power-summary #| tbl-cap: "Power of the Jones and Modified Jones models at selected manipulation levels. Even at 10% of assets—a large amount of manipulation—detection rates remain well below 50% for most manipulation types, underscoring the low statistical power of standard tests." power_summary = ( power_df[power_df["model"].isin(["jones", "mod_jones"])] .pivot_table(index=["model", "level"], columns="type", values="power") .round(3) ) power_summary.index.names = ["Model", "Level (% assets)"] power_summary ``` ::: callout-important ## Implications for Vietnamese Research The power analysis has stark implications for earnings management research in Vietnam. The typical Vietnamese listed firm has been listed for 10–15 years, providing far fewer estimation-period observations than in the U.S. context where @dechow1995detecting had decades of Compustat data. Shorter estimation windows increase parameter uncertainty in the Jones model, further reducing power. Combined with the noisier financial data common in emerging markets, researchers should interpret non-rejection of the null as uninformative rather than as evidence of clean financial reporting. ::: ## Vietnamese Institutional Context {#sec-vietnam} ### Channels of Earnings Management Several features of the Vietnamese institutional environment create distinctive earnings management incentives and opportunities: **Tax-driven manipulation.** Vietnamese corporate income tax (CIT) rates have declined from 28% (pre-2009) to 20% (2016 onwards), with preferential rates for firms in Special Economic Zones and high-tech sectors. The close alignment between VAS accounting and tax accounting creates incentives to manage earnings downward to reduce tax obligations—a pattern documented in developing economies with code-law accounting traditions [@ball2003incentives]. **IPO and seasoned equity offering (SEO) incentives.** Vietnam has experienced several waves of SOE equitization (partial privatization). Managers have incentives to inflate earnings before share offerings to maximize proceeds. The SSC requires minimum profitability thresholds for listing eligibility, creating sharp incentives around these regulatory cutoffs, which is a natural setting for the discontinuity analysis of @burgstahler1997earnings. **Real earnings management in manufacturing.** @roychowdhury2006earnings identifies overproduction, discretionary expenditure cuts, and sales manipulation as the three main channels of real earnings management. Vietnam's large manufacturing sector (textiles, electronics assembly, food processing) provides ample scope for overproduction-based REM, where unit costs are reduced by spreading fixed overhead across larger production runs. **Related-party transactions.** Transactions with affiliated entities are a well-documented channel for earnings manipulation in Asian markets. Vietnamese conglomerates (*tập đoàn*) often feature complex cross-ownership structures where transfer pricing between subsidiaries can shift profits across reporting entities. ### The Earnings Distribution Test @burgstahler1997earnings observed a striking discontinuity in the distribution of reported earnings around zero: far more firms report small positive earnings than small losses, relative to what a smooth distribution would predict. This pattern is interpreted as evidence that firms manage earnings to avoid reporting losses. We apply this test to Vietnamese-style data to illustrate the methodology. ```{python} #| label: fig-earnings-distribution #| fig-cap: "Distribution of scaled earnings around zero. A discontinuity—excess density just above zero and a deficit just below—would suggest benchmark-beating behavior. The red dashed line marks the zero threshold." # Generate earnings for a larger sample with benchmark-beating behavior rng_dist = np.random.default_rng(2024) n_firms_dist = 2000 n_years_dist = 10 earnings = [] for i in range(n_firms_dist): base_earn = rng_dist.normal(0.06, 0.08) for t in range(n_years_dist): e = base_earn + rng_dist.normal(0, 0.04) # Simulate benchmark-beating: firms near zero bump earnings up if -0.01 < e < 0.005: e += rng_dist.uniform(0.005, 0.015) * (rng_dist.random() < 0.6) earnings.append(e) earn_array = np.array(earnings) fig, ax = plt.subplots(figsize=(8, 5)) bins = np.arange(-0.30, 0.35, 0.01) ax.hist(earn_array, bins=bins, color="steelblue", edgecolor="white", alpha=0.85, density=True) ax.axvline(0, color="firebrick", linestyle="--", linewidth=1.5, label="Zero threshold") ax.set_xlabel("Earnings / Total Assets") ax.set_ylabel("Density") ax.set_title("Earnings Distribution Around Zero") ax.legend() plt.tight_layout() plt.show() ``` ```{python} #| label: fig-earnings-change-dist #| fig-cap: "Distribution of year-over-year earnings changes around zero. A discontinuity here suggests that firms manage earnings to avoid reporting declines, even when small." # Year-over-year changes earn_changes = np.diff(earn_array.reshape(n_firms_dist, n_years_dist), axis=1).flatten() fig, ax = plt.subplots(figsize=(8, 5)) bins_chg = np.arange(-0.15, 0.15, 0.005) ax.hist(earn_changes, bins=bins_chg, color="darkorange", edgecolor="white", alpha=0.85, density=True) ax.axvline(0, color="firebrick", linestyle="--", linewidth=1.5, label="Zero change") ax.set_xlabel("Change in Earnings / Total Assets") ax.set_ylabel("Density") ax.set_title("Earnings Change Distribution Around Zero") ax.legend() plt.tight_layout() plt.show() ``` ### Accrual-Based vs. Real Earnings Management @cohen2008real document a shift from AEM to REM following the passage of the Sarbanes-Oxley Act (SOX) in 2002, suggesting that tighter regulatory scrutiny redirects manipulation toward less detectable channels. @zang2012evidence provides further evidence that firms trade off between AEM and REM based on their relative costs. In Vietnam, where regulatory enforcement of accounting standards is weaker than in post-SOX America, we might expect AEM to remain the dominant channel. However, as Vietnam moves toward IFRS adoption and strengthens SSC oversight, the AEM-to-REM substitution hypothesis becomes testable. We can measure REM using the @roychowdhury2006earnings approach. Three proxies capture different manipulation channels: **Abnormal cash flow from operations.** Estimate normal CFO as a function of sales and sales changes: $$ \frac{CFO_{i,t}}{A_{i,t-1}} = \beta_0 + \beta_1 \frac{1}{A_{i,t-1}} + \beta_2 \frac{S_{i,t}}{A_{i,t-1}} + \beta_3 \frac{\Delta S_{i,t}}{A_{i,t-1}} + \varepsilon_{i,t} $$ {#eq-abnormal-cfo} **Abnormal production costs.** Production costs = COGS + change in inventory. Normal levels are modeled as: $$ \frac{PROD_{i,t}}{A_{i,t-1}} = \beta_0 + \beta_1 \frac{1}{A_{i,t-1}} + \beta_2 \frac{S_{i,t}}{A_{i,t-1}} + \beta_3 \frac{\Delta S_{i,t}}{A_{i,t-1}} + \beta_4 \frac{\Delta S_{i,t-1}}{A_{i,t-1}} + \varepsilon_{i,t} $$ {#eq-abnormal-prod} **Abnormal discretionary expenditures.** R&D + advertising + SGA, modeled as: $$ \frac{DISC_{i,t}}{A_{i,t-1}} = \beta_0 + \beta_1 \frac{1}{A_{i,t-1}} + \beta_2 \frac{S_{i,t-1}}{A_{i,t-1}} + \varepsilon_{i,t} $$ {#eq-abnormal-disc} Residuals from these regressions serve as proxies for real manipulation. A firm that overproduces will show abnormally *high* production costs and abnormally *low* CFO (cash tied up in inventory). ## Summary This chapter examined the measurement and detection of earnings management, with a focus on methodological rigor and adaptation to Vietnam's institutional environment. The key takeaways are: - Detecting earnings management requires separating discretionary from non-discretionary accruals, which depends critically on the quality of the non-discretionary accrual model. All five canonical models have known weaknesses. - Under the null hypothesis of no manipulation, the Jones and Modified Jones models over-reject when standard two-stage standard errors are used. The @salkever1976use correction provides properly calibrated inference. - When test firms experience extreme financial performance, all models exhibit severe size distortion, misattributing performance-driven accrual variation to managerial discretion. Performance matching [@kothari2005performance] partially addresses this. - Power analysis reveals that economically plausible levels of manipulation (below 10% of assets) are detected with very low probability. This casts doubt on studies that report null results as evidence of no manipulation. - Vietnam's institutional features, such aSOE governance, VAS accounting rules, weak enforcement, and thin audit coverage—create a rich setting for earnings management research, but the methodological challenges are amplified by shorter time series and noisier data. Researchers working with Vietnamese data should: (i) use the Salkever correction for proper inference, (ii) implement performance matching, (iii) consider multiple models and triangulate results, and (iv) supplement accrual-based approaches with real earnings management and distributional tests.

Symbol	Definition
\(TA_{i,t}\)	Total accruals (scaled by lagged assets)
\(NDA_{i,t}\)	Non-discretionary accruals (model prediction)
\(DA_{i,t}\)	Discretionary accruals: \(DA_{i,t} = TA_{i,t} - NDA_{i,t}\)
\(A_{i,t}\)	Total assets
\(\Delta Rev_{i,t}\)	Change in revenues, scaled by \(A_{i,t-1}\)
\(\Delta Rec_{i,t}\)	Change in net receivables, scaled by \(A_{i,t-1}\)
\(PPE_{i,t}\)	Gross property, plant, and equipment, scaled by \(A_{i,t-1}\)
\(PART_{i,t}\)	Indicator equal to 1 for the test (event) year