30 Disclosure Quality and Timing

Corporate disclosure is the primary mechanism through which firms communicate with capital markets. The quality, quantity, and timing of disclosures shape the information environment in which investors form expectations, price securities, and allocate capital. A large theoretical and empirical literature, surveyed by Healy and Palepu (2001) and Beyer et al. (2010), demonstrates that disclosure decisions have first-order effects on the cost of capital, liquidity, and investment efficiency.

This chapter brings two decades of disclosure research to the Vietnamese market, where several institutional features create a distinctive setting. First, Vietnam’s regulatory framework, anchored by Circular 155/2015/TT-BTC (amended by Circular 96/2020/TT-BTC) and enforced by the State Securities Commission (SSC), mandates periodic and event-driven disclosures with specific deadlines that differ from U.S. and European norms. Second, the dominance of retail investors and relatively thin analyst coverage means that corporate disclosures are often the primary source of firm-specific information, amplifying their economic importance. Third, the ongoing transition from Vietnamese Accounting Standards (VAS) toward IFRS convergence introduces time-varying changes in disclosure requirements that create natural variation for empirical analysis.

30.1 Theoretical Foundations

30.1.1 Voluntary Disclosure Theory

The foundational model of voluntary disclosure is due to Verrecchia (1983), who shows that in a setting where investors know a manager possesses private information, an unraveling equilibrium emerges: silence is interpreted as bad news, so managers disclose unless the proprietary cost of disclosure exceeds its benefit. The key insight is that non-disclosure is informative because investors rationally infer that withheld information is unfavourable.

Diamond (1985) extends the analysis to a multi-period setting where the firm’s disclosure policy affects the precision of public information and hence the incentives for private information acquisition. The central trade-off is between reducing information asymmetry (which lowers the cost of capital) and reducing the rents that informed traders earn (which may discourage monitoring). Diamond and Verrecchia (1991) formalize the link between disclosure and liquidity: by reducing adverse selection, voluntary disclosure narrows bid-ask spreads and increases the willingness of uninformed investors to trade.

The empirical prediction is that firms with higher-quality disclosure should enjoy:

Lower cost of equity capital (Botosan 1997; Botosan and Plumlee 2002)
Lower cost of debt (Sengupta 1998)
Higher liquidity and lower bid-ask spreads (Diamond and Verrecchia 1991; Lang, Lins, and Maffett 2012)
More efficient investment decisions (Biddle, Hilary, and Verdi 2009)

30.1.2 Strategic Disclosure Timing

Not all disclosure is voluntary in timing, but managers retain discretion over when, within permissible windows, to release information. Patell and Wolfson (1982) document that firms tend to release good news during trading hours and bad news after market close. DellaVigna and Pollet (2009) show that earnings announced on Fridays (i.e., when investor attention is lower) generate smaller immediate reactions and larger post-announcement drift, consistent with limited attention. Hirshleifer, Lim, and Teoh (2009) generalize this finding: extraneous events that distract investors (such as a large number of concurrent announcements) reduce the immediate price response to earnings news.

In Vietnam, several features make strategic timing particularly relevant. The concentrated disclosure calendar, where many firms file near regulatory deadlines, creates natural variation in announcement congestion. The retail-dominated investor base may be more susceptible to attention effects than institutional investors. The regulatory structure, which imposes penalties for late filing but allows discretion within the permissible window, creates a setting in which the choice of filing date is informative.

30.1.3 Disclosure Quality in Emerging Markets

Ball, Robin, and Wu (2003) argue that accounting quality is shaped more by reporting incentives than by accounting standards. In institutional environments with weak enforcement, concentrated ownership, and close alignment between financial and tax reporting, firms may produce lower-quality disclosures even under nominally rigorous standards. Leuz, Nanda, and Wysocki (2003) confirm this pattern internationally: earnings management (an inverse proxy for disclosure quality) is highest in countries with weak investor protection, concentrated ownership, and less developed capital markets.

Vietnam exhibits several of these features. Bushman et al. (2004) classify determinants of transparency into governance factors (legal origin, judicial efficiency, minority protection) and political factors (state ownership, government intervention). Vietnam’s civil-law tradition, significant state ownership in listed firms, and evolving enforcement capacity suggest that disclosure quality may be lower on average than in developed markets, but with substantial cross-sectional variation driven by firm-level governance and ownership structures.

30.2 Regulatory Framework

30.2.1 Mandatory Disclosure Requirements

Vietnamese disclosure regulation operates through a hierarchy of legal instruments:

Securities Law (2019): Establishes the general obligation of listed firms to disclose information truthfully, accurately, completely, and on time (Article 118).
Circular 155/2015/TT-BTC (amended by Circular 96/2020/TT-BTC): Specifies the content, format, and deadlines for periodic and event-driven disclosures.
SSC decisions and guidance: Provide implementation details and sector-specific requirements.

The key periodic reporting deadlines are in Table 30.1

Table 30.1: Periodic disclosure deadlines under Vietnamese securities regulation.

Report Type	Deadline	Audit Requirement
Annual financial statements	90 days after fiscal year-end	Audited
Semi-annual financial statements	45 days after period-end	Reviewed
Quarterly financial statements	20 days after quarter-end	Unaudited
Annual report	110 days after fiscal year-end	N/A (narrative)

Event-driven (ad hoc) disclosures must be filed within 24 hours for material events, including changes in ownership exceeding 1% by major shareholders, board resolutions on dividends or capital increases, and any event that may materially affect the share price.

30.2.2 Penalties for Non-Compliance

The SSC may impose administrative fines for late or incomplete disclosure, typically ranging from VND 50–100 million for minor violations and up to VND 500 million for material omissions. While these amounts are modest relative to firm size for large-cap companies, the reputational cost and the risk of trading suspension provide additional deterrence.

30.3 Data Construction

30.3.1 Loading Required Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
from datetime import datetime, timedelta
from scipy import stats
from sklearn.preprocessing import StandardScaler
from linearmodels.panel import PanelOLS
import warnings
warnings.filterwarnings('ignore')

# Plotting defaults
plt.rcParams.update({
    'figure.figsize': (10, 6),
    'figure.dpi': 150,
    'font.size': 11,
    'axes.spines.top': False,
    'axes.spines.right': False
})

30.3.2 Retrieving Disclosure Data

We assume that we have structured data on filing dates, announcement timestamps, and the textual content of corporate disclosures for all firms.

from datacore import DataCoreClient

client = DataCoreClient()

# Filing metadata: announcement dates, filing dates, report types
filings = client.get_filings(
    exchanges=['HOSE', 'HNX'],
    report_types=['annual', 'semi_annual', 'quarterly'],
    start_date='2012-01-01',
    end_date='2024-12-31',
    fields=[
        'ticker', 'report_type', 'fiscal_year', 'fiscal_quarter',
        'fiscal_year_end', 'filing_date', 'announcement_date',
        'auditor', 'audit_opinion', 'file_url'
    ]
)

# Financial statement data
financials = client.get_fundamentals(
    exchanges=['HOSE', 'HNX'],
    start_date='2012-01-01',
    end_date='2024-12-31',
    fields=[
        'ticker', 'fiscal_year', 'fiscal_quarter',
        'total_assets', 'total_equity', 'revenue', 'net_income',
        'operating_cash_flow', 'total_accruals',
        'market_cap', 'book_to_market'
    ]
)

# Daily trading data for event studies
trading = client.get_daily_prices(
    exchanges=['HOSE', 'HNX'],
    start_date='2012-01-01',
    end_date='2024-12-31',
    fields=[
        'ticker', 'date', 'close', 'volume', 'turnover',
        'bid_ask_spread', 'market_return'
    ]
)

# Ownership and governance
governance = client.get_governance(
    exchanges=['HOSE', 'HNX'],
    fields=[
        'ticker', 'fiscal_year', 'state_ownership_pct',
        'foreign_ownership_pct', 'board_size',
        'board_independence_pct', 'big4_auditor',
        'dual_listing'
    ]
)

print(f"Filings: {filings.shape[0]:,} records")
print(f"Financials: {financials.shape[0]:,} records")
print(f"Trading: {trading.shape[0]:,} records")
print(f"Governance: {governance.shape[0]:,} records")

30.3.3 Computing Filing Timeliness

We define reporting lag as the number of calendar days between the fiscal period-end and the date the firm’s financial statements are made publicly available. For annual reports, the regulatory maximum is 90 days; firms that file earlier than the deadline reveal information sooner, while firms that file late face potential penalties and signal possible difficulties with their accounts.

filings['fiscal_year_end'] = pd.to_datetime(filings['fiscal_year_end'])
filings['filing_date'] = pd.to_datetime(filings['filing_date'])
filings['announcement_date'] = pd.to_datetime(filings['announcement_date'])

# Reporting lag = filing date - fiscal period end
filings['reporting_lag'] = (
    filings['filing_date'] - filings['fiscal_year_end']
).dt.days

# Regulatory deadline based on report type
deadline_map = {
    'annual': 90,
    'semi_annual': 45,
    'quarterly': 20
}
filings['deadline_days'] = filings['report_type'].map(deadline_map)

# Late filing indicator
filings['late_filing'] = (
    filings['reporting_lag'] > filings['deadline_days']
).astype(int)

# Days relative to deadline (negative = early, positive = late)
filings['days_relative_deadline'] = (
    filings['reporting_lag'] - filings['deadline_days']
)

# Summary statistics
annual_filings = filings[filings['report_type'] == 'annual'].copy()
print("Annual Report Filing Lag (calendar days):")
print(annual_filings['reporting_lag'].describe().round(1))
print(f"\nLate filing rate: {annual_filings['late_filing'].mean():.1%}")

30.3.4 Distribution of Filing Lags

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Histogram
axes[0].hist(
    annual_filings['reporting_lag'].dropna(),
    bins=60, range=(20, 150),
    color='#2C5F8A', edgecolor='white', alpha=0.85
)
axes[0].axvline(x=90, color='#C0392B', linestyle='--', linewidth=2,
                label='90-day deadline')
axes[0].set_xlabel('Reporting Lag (Calendar Days)')
axes[0].set_ylabel('Number of Filings')
axes[0].set_title('Distribution of Annual Report Filing Lags')
axes[0].legend()

# Time trend: median lag by year
median_lag = (
    annual_filings
    .groupby('fiscal_year')['reporting_lag']
    .agg(['median', lambda x: x.quantile(0.25),
          lambda x: x.quantile(0.75)])
)
median_lag.columns = ['median', 'p25', 'p75']

axes[1].fill_between(
    median_lag.index, median_lag['p25'], median_lag['p75'],
    alpha=0.3, color='#2C5F8A', label='IQR'
)
axes[1].plot(
    median_lag.index, median_lag['median'],
    color='#2C5F8A', linewidth=2, marker='o', label='Median'
)
axes[1].axhline(y=90, color='#C0392B', linestyle='--',
                linewidth=1.5, label='Deadline')
axes[1].set_xlabel('Fiscal Year')
axes[1].set_ylabel('Filing Lag (Calendar Days)')
axes[1].set_title('Median Annual Filing Lag Over Time')
axes[1].legend()

plt.tight_layout()
plt.show()

Figure 30.1

30.4 Measuring Disclosure Quality

Disclosure quality is inherently multidimensional. Following Dechow, Ge, and Schrand (2010) and Beyer et al. (2010), we construct proxies along four dimensions: (i) timeliness, (ii) textual properties, (iii) accounting quality, and (iv) voluntary disclosure breadth.

30.4.1 Timeliness as a Quality Dimension

Timely disclosure reduces the duration of information asymmetry between insiders and outside investors. Chambers and Penman (1984) and Givoly and Palmon (1982) establish that early reporters tend to announce good news, while late reporters more often deliver bad news. We test this pattern in the Vietnamese context below.

We operationalize timeliness through two measures:

Reporting lag (continuous): Calendar days from fiscal period-end to filing date, as constructed in Section 30.3.3.
Early/late classification (categorical): We classify firm-years into terciles based on reporting lag within each fiscal year. This controls for secular trends in filing speed (e.g., driven by regulatory changes or COVID-19 disruptions).

annual_filings['lag_tercile'] = (
    annual_filings
    .groupby('fiscal_year')['reporting_lag']
    .transform(lambda x: pd.qcut(x, 3, labels=['Early', 'Middle', 'Late']))
)

# Tabulate
tercile_stats = (
    annual_filings
    .groupby('lag_tercile')['reporting_lag']
    .agg(['count', 'mean', 'median', 'std'])
    .round(1)
)
tercile_stats.columns = ['N', 'Mean Lag', 'Median Lag', 'SD']
print(tercile_stats)

30.4.2 Textual Quality Measures

The textual properties of corporate disclosures convey information about quality beyond what is captured by accounting numbers alone. Li (2008) demonstrates that annual reports with lower readability are associated with lower earnings persistence, suggesting that complex language may obscure unfavourable information. Loughran and McDonald (2014) critique the application of general readability formulas (Fog index, Flesch-Kincaid) to financial text, arguing that these metrics confound complexity with technical terminology.

We construct three textual quality measures adapted for Vietnamese corporate disclosures:

30.4.2.1 Document Length and Specificity

Longer disclosures are not inherently better—length may reflect boilerplate or obfuscation. However, Dyer, Lang, and Stice-Lawrence (2017) show that the informative component of disclosure (as opposed to standard legal language) has increased over time in U.S. 10-K filings. We measure:

Total word count: Raw length of the annual report narrative sections (MD&A equivalent)
Numerical density: Proportion of tokens that are numbers, percentages, or currency amounts, which is a proxy for specificity.

import re
from underthesea import word_tokenize

def compute_textual_metrics(text):
    """Compute textual quality metrics for Vietnamese corporate text."""
    if not text or len(text.strip()) == 0:
        return {
            'word_count': 0, 'sentence_count': 0,
            'numerical_density': 0, 'avg_sentence_length': 0,
            'unique_word_ratio': 0, 'forward_looking_density': 0
        }

    # Vietnamese word segmentation
    tokens = word_tokenize(text)
    sentences = re.split(r'[.!?。]', text)
    sentences = [s.strip() for s in sentences if len(s.strip()) > 5]

    word_count = len(tokens)
    sentence_count = max(len(sentences), 1)

    # Numerical density: proportion of tokens that are numeric
    num_pattern = re.compile(r'^[\d,.%]+$')
    numeric_tokens = sum(1 for t in tokens if num_pattern.match(t))
    numerical_density = numeric_tokens / max(word_count, 1)

    # Lexical diversity: unique words / total words
    unique_words = len(set(t.lower() for t in tokens))
    unique_word_ratio = unique_words / max(word_count, 1)

    # Forward-looking statement density
    forward_keywords = [
        'dự kiến', 'kế hoạch', 'mục tiêu', 'triển vọng',
        'định hướng', 'chiến lược', 'tương lai', 'sẽ',
        'dự báo', 'phấn đấu', 'cam kết', 'hướng tới'
    ]
    text_lower = text.lower()
    forward_count = sum(text_lower.count(kw) for kw in forward_keywords)
    forward_looking_density = forward_count / max(sentence_count, 1)

    return {
        'word_count': word_count,
        'sentence_count': sentence_count,
        'numerical_density': numerical_density,
        'avg_sentence_length': word_count / sentence_count,
        'unique_word_ratio': unique_word_ratio,
        'forward_looking_density': forward_looking_density
    }

# Retrieve annual report text from DataCore
annual_text = client.get_annual_report_text(
    exchanges=['HOSE', 'HNX'],
    start_date='2012-01-01',
    end_date='2024-12-31',
    sections=['mda', 'business_overview', 'risk_factors']
)

# Apply textual metrics
textual_metrics = annual_text.apply(
    lambda row: compute_textual_metrics(row['text']),
    axis=1, result_type='expand'
)
annual_text = pd.concat([annual_text, textual_metrics], axis=1)

print("Textual Quality Summary Statistics:")
print(annual_text[['word_count', 'numerical_density',
                    'avg_sentence_length', 'unique_word_ratio',
                    'forward_looking_density']].describe().round(3))

30.4.2.2 Forward-Looking Statement Density

Forward-looking statements reveal management’s expectations about future performance and are considered a higher-quality form of disclosure because they expose the manager to ex-post evaluation. In Vietnamese reports, forward-looking language typically appears in the form of phrases like dự kiến (expected), kế hoạch (plan), mục tiêu (target), and triển vọng (outlook).

Guay, Samuels, and Taylor (2016) show that managers use voluntary disclosure to “guide through the fog” when financial statements are complex. We operationalize forward-looking density as the number of forward-looking phrases per sentence, following the keyword approach in our compute_textual_metrics function above.

30.4.3 Accounting-Based Quality Proxies

We complement textual measures with accounting-based proxies that capture the reliability of reported financial information.

30.4.3.1 Accruals Quality

Following Francis et al. (2005), we measure accruals quality as the standard deviation of residuals from a regression of working capital accruals on past, current, and future operating cash flows:

\[ \frac{WC_{i,t}}{A_{i,t-1}} = \alpha + \beta_1 \frac{CFO_{i,t-1}}{A_{i,t-1}} + \beta_2 \frac{CFO_{i,t}}{A_{i,t-1}} + \beta_3 \frac{CFO_{i,t+1}}{A_{i,t-1}} + \varepsilon_{i,t} \tag{30.1}\]

where $WC_{i,t}$ is working capital accruals, $CFO_{i,t}$ is operating cash flow, and $A_{i,t-1}$ is lagged total assets. The firm-level standard deviation of $\hat{\varepsilon}_{i,t}$ over a rolling window (typically 5 years) is the accruals quality measure, with higher values indicating lower quality.

def estimate_accruals_quality(df, min_obs=5):
    """
    Estimate accruals quality as std dev of DD residuals
    over a rolling 5-year window for each firm.
    """
    results = []

    for ticker, group in df.groupby('ticker'):
        group = group.sort_values('fiscal_year')

        # Construct leads/lags of CFO
        group['cfo_lag1'] = group['operating_cash_flow'].shift(1)
        group['cfo_lead1'] = group['operating_cash_flow'].shift(-1)

        # Scale by lagged assets
        group['lag_assets'] = group['total_assets'].shift(1)
        for col in ['total_accruals', 'operating_cash_flow',
                     'cfo_lag1', 'cfo_lead1']:
            group[f'{col}_scaled'] = group[col] / group['lag_assets']

        # Rolling 5-year residual std dev
        for idx in range(len(group)):
            window = group.iloc[max(0, idx - 4):idx + 1]
            window = window.dropna(subset=[
                'total_accruals_scaled', 'operating_cash_flow_scaled',
                'cfo_lag1_scaled', 'cfo_lead1_scaled'
            ])

            if len(window) >= min_obs:
                y = window['total_accruals_scaled']
                X = sm.add_constant(window[[
                    'cfo_lag1_scaled',
                    'operating_cash_flow_scaled',
                    'cfo_lead1_scaled'
                ]])
                try:
                    model = sm.OLS(y, X).fit()
                    results.append({
                        'ticker': ticker,
                        'fiscal_year': group.iloc[idx]['fiscal_year'],
                        'accruals_quality': model.resid.std()
                    })
                except Exception:
                    pass

    return pd.DataFrame(results)

aq_df = estimate_accruals_quality(financials)
print(f"Accruals quality computed for {aq_df['ticker'].nunique()} firms")
print(aq_df['accruals_quality'].describe().round(4))

30.4.3.2 Earnings Persistence and Predictability

Persistent earnings are more useful for valuation. We estimate earnings persistence as the slope coefficient $\phi_1$ from a first-order autoregression:

\[ \frac{E_{i,t}}{A_{i,t-1}} = \phi_0 + \phi_1 \frac{E_{i,t-1}}{A_{i,t-2}} + \nu_{i,t} \tag{30.2}\]

Higher $\hat{\phi}_1$ indicates more persistent (and arguably higher-quality) earnings.

def estimate_persistence(df, min_obs=5):
    """Estimate earnings persistence via AR(1) model."""
    results = []

    for ticker, group in df.groupby('ticker'):
        group = group.sort_values('fiscal_year')
        group['earnings_scaled'] = group['net_income'] / group['total_assets'].shift(1)
        group['earnings_lag'] = group['earnings_scaled'].shift(1)

        clean = group.dropna(subset=['earnings_scaled', 'earnings_lag'])
        if len(clean) >= min_obs:
            y = clean['earnings_scaled']
            X = sm.add_constant(clean[['earnings_lag']])
            model = sm.OLS(y, X).fit()
            results.append({
                'ticker': ticker,
                'persistence': model.params['earnings_lag'],
                'persistence_se': model.bse['earnings_lag'],
                'r_squared': model.rsquared,
                'n_obs': model.nobs
            })

    return pd.DataFrame(results)

persistence_df = estimate_persistence(financials)
print(persistence_df[['persistence', 'r_squared']].describe().round(3))

30.4.4 Composite Disclosure Quality Index

Individual quality proxies capture different facets of the information environment. To aggregate them into a single score while avoiding arbitrary weighting, we follow Lang and Lundholm (1993) and use a rank-based composite. For each firm-year, we rank firms on each of the following dimensions (higher rank = higher quality) (Table 30.2).

Table 30.2: Components of the composite disclosure quality index.

Dimension	Proxy	Direction
Timeliness	Reporting lag	Lower is better
Specificity	Numerical density	Higher is better
Forward-looking	FLS density	Higher is better
Earnings quality	Accruals quality (DD)	Lower σ is better
Persistence	AR(1) coefficient	Higher is better

We convert each proxy to a percentile rank within each fiscal year (so each component ranges from 0 to 1), then average across components:

\[ DQ_{i,t} = \frac{1}{K} \sum_{k=1}^{K} \text{Rank}_{k,i,t} \tag{30.3}\]

where $K$ is the number of available components and $\text{Rank}_{k,i,t}$ is the percentile rank of firm $i$ in year $t$ on dimension $k$.

# Merge all quality proxies
quality_panel = (
    annual_filings[['ticker', 'fiscal_year', 'reporting_lag']]
    .merge(
        annual_text[['ticker', 'fiscal_year', 'numerical_density',
                      'forward_looking_density']],
        on=['ticker', 'fiscal_year'], how='left'
    )
    .merge(aq_df, on=['ticker', 'fiscal_year'], how='left')
    .merge(persistence_df[['ticker', 'persistence']],
           on='ticker', how='left')
)

# Rank each component within fiscal year (higher = better quality)
def year_percentile_rank(series):
    """Convert to percentile rank within group."""
    return series.rank(pct=True)

rank_cols = {}
for col, ascending in [
    ('reporting_lag', False),       # lower lag = better → invert
    ('numerical_density', True),    # higher = better
    ('forward_looking_density', True),
    ('accruals_quality', False),    # lower volatility = better → invert
    ('persistence', True)           # higher = better
]:
    col_to_rank = quality_panel[col] if ascending else -quality_panel[col]
    rank_cols[f'rank_{col}'] = (
        quality_panel
        .groupby('fiscal_year')[col]
        .transform(lambda x: x.rank(pct=True) if ascending
                   else (-x).rank(pct=True))
    )

rank_df = pd.DataFrame(rank_cols)
quality_panel = pd.concat([quality_panel, rank_df], axis=1)

# Composite index: average of available ranks
rank_columns = [c for c in quality_panel.columns if c.startswith('rank_')]
quality_panel['dq_index'] = quality_panel[rank_columns].mean(axis=1)

print("Disclosure Quality Index Distribution:")
print(quality_panel['dq_index'].describe().round(3))

fig, ax = plt.subplots(figsize=(10, 5))
ax.hist(quality_panel['dq_index'].dropna(), bins=50,
        color='#2C5F8A', edgecolor='white', alpha=0.85)
ax.axvline(quality_panel['dq_index'].median(), color='#E67E22',
           linestyle='--', linewidth=2, label='Median')
ax.set_xlabel('Disclosure Quality Index')
ax.set_ylabel('Number of Firm-Years')
ax.set_title('Distribution of Composite Disclosure Quality')
ax.legend()
plt.tight_layout()
plt.show()

Figure 30.2

30.5 Determinants of Disclosure Quality

What drives variation in disclosure quality across Vietnamese firms? We estimate a cross-sectional regression of the composite DQ index on firm characteristics and governance variables:

\[ DQ_{i,t} = \alpha + \beta_1 \ln(\text{Size}_{i,t}) + \beta_2 \text{ROA}_{i,t} + \beta_3 \text{Lev}_{i,t} + \beta_4 \text{StateOwn}_{i,t} + \beta_5 \text{ForeignOwn}_{i,t} + \beta_6 \text{Big4}_{i,t} + \beta_7 \text{BoardIndep}_{i,t} + \gamma_t + \varepsilon_{i,t} \tag{30.4}\]

where $\gamma_t$ are year fixed effects.

The theoretical predictions, drawing on Lang and Lundholm (1993), Hope (2003), and Bushman et al. (2004), are:

Size (+): Larger firms face greater public scrutiny and have lower proprietary costs relative to the benefits of disclosure.
ROA (+/−): Profitable firms may disclose more to signal quality, but firms managing earnings downward (for tax purposes) may reduce disclosure to avoid scrutiny.
Leverage (+): Sengupta (1998) argues that firms with more debt have stronger incentives to maintain disclosure quality to lower borrowing costs.
State ownership (−): SOEs may face weaker market discipline and political incentives to limit transparency.
Foreign ownership (+): Foreign institutional investors demand higher transparency.
Big 4 auditor (+): High-quality auditors constrain earnings management and indirectly improve disclosure quality.
Board independence (+): Independent directors improve monitoring and encourage more informative disclosure.

# Merge quality index with financials and governance
det_panel = (
    quality_panel[['ticker', 'fiscal_year', 'dq_index']]
    .merge(financials, on=['ticker', 'fiscal_year'], how='left')
    .merge(governance, on=['ticker', 'fiscal_year'], how='left')
)

# Construct variables
det_panel['log_size'] = np.log(det_panel['total_assets'])
det_panel['roa'] = det_panel['net_income'] / det_panel['total_assets']
det_panel['leverage'] = (
    (det_panel['total_assets'] - det_panel['total_equity'])
    / det_panel['total_assets']
)

# Panel regression with year FE
det_panel = det_panel.set_index(['ticker', 'fiscal_year'])

model_det = PanelOLS(
    dependent=det_panel['dq_index'],
    exog=sm.add_constant(det_panel[[
        'log_size', 'roa', 'leverage',
        'state_ownership_pct', 'foreign_ownership_pct',
        'big4_auditor', 'board_independence_pct'
    ]]),
    entity_effects=False,
    time_effects=True,
    check_rank=False
).fit(cov_type='clustered', cluster_entity=True)

print(model_det.summary)

coefs = model_det.params.drop('const')
ci = model_det.conf_int().drop('const')

fig, ax = plt.subplots(figsize=(8, 5))
y_pos = range(len(coefs))
labels = [
    'ln(Assets)', 'ROA', 'Leverage', 'State Own %',
    'Foreign Own %', 'Big 4 Auditor', 'Board Indep %'
]

colors = ['#2C5F8A' if c > 0 else '#C0392B' for c in coefs.values]
ax.barh(y_pos, coefs.values, color=colors, alpha=0.8, height=0.6)
ax.errorbar(
    coefs.values, y_pos,
    xerr=[coefs.values - ci.iloc[:, 0].values,
          ci.iloc[:, 1].values - coefs.values],
    fmt='none', color='black', capsize=3
)
ax.axvline(x=0, color='gray', linewidth=0.8, linestyle='-')
ax.set_yticks(y_pos)
ax.set_yticklabels(labels)
ax.set_xlabel('Coefficient Estimate')
ax.set_title('Determinants of Disclosure Quality')
plt.tight_layout()
plt.show()

Figure 30.3

30.6 Strategic Disclosure Timing

30.6.1 Day-of-Week Effects

DellaVigna and Pollet (2009) document that Friday earnings announcements receive less immediate market attention. We test whether this pattern holds in Vietnam, where the trading week runs Monday through Friday but the retail-dominated investor base may exhibit different attention patterns.

annual_filings['announcement_dow'] = (
    annual_filings['announcement_date'].dt.dayofweek
)
annual_filings['day_name'] = (
    annual_filings['announcement_date'].dt.day_name()
)

# Compute surprise: actual earnings minus naive expectation (last year's earnings)
annual_filings = annual_filings.merge(
    financials[['ticker', 'fiscal_year', 'net_income', 'total_assets']],
    on=['ticker', 'fiscal_year'], how='left'
)
annual_filings['earnings_scaled'] = (
    annual_filings['net_income'] / annual_filings['total_assets']
)
annual_filings['earnings_surprise'] = (
    annual_filings
    .groupby('ticker')['earnings_scaled']
    .diff()
)

# Classify as good/bad news
annual_filings['bad_news'] = (
    annual_filings['earnings_surprise'] < 0
).astype(int)

# Day-of-week distribution by news type
dow_crosstab = pd.crosstab(
    annual_filings['day_name'],
    annual_filings['bad_news'].map({0: 'Good News', 1: 'Bad News'}),
    normalize='columns'
)
# Reorder days
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
dow_crosstab = dow_crosstab.reindex(day_order)

print("Proportion of Announcements by Day and News Type:")
print(dow_crosstab.round(3))

fig, ax = plt.subplots(figsize=(10, 5))
x = np.arange(len(day_order))
width = 0.35

bad_pct = dow_crosstab['Bad News'].values
good_pct = dow_crosstab['Good News'].values

ax.bar(x - width/2, good_pct, width, label='Good News',
       color='#27AE60', alpha=0.8)
ax.bar(x + width/2, bad_pct, width, label='Bad News',
       color='#C0392B', alpha=0.8)
ax.set_xticks(x)
ax.set_xticklabels(day_order)
ax.set_ylabel('Proportion of Announcements')
ax.set_title('Strategic Timing: Day-of-Week Announcement Patterns')
ax.legend()
plt.tight_layout()
plt.show()

Figure 30.4

30.6.2 Announcement Congestion

When many firms announce on the same day, each announcement receives less attention. We measure announcement congestion as the number of other firms making earnings announcements on the same date:

\[ \text{Congestion}_{i,t} = \sum_{j \neq i} \mathbf{1}\{\text{AnnDate}_{j} = \text{AnnDate}_{i}\} \tag{30.5}\]

Hirshleifer, Lim, and Teoh (2009) predict that firms burying bad news will choose high-congestion days. We test this by regressing the congestion variable on the sign of earnings news:

# Count announcements per date
ann_counts = (
    annual_filings
    .groupby('announcement_date')
    .size()
    .reset_index(name='n_announcements')
)
annual_filings = annual_filings.merge(
    ann_counts, on='announcement_date', how='left'
)
annual_filings['congestion'] = annual_filings['n_announcements'] - 1

# Regression: congestion ~ bad_news + controls
congestion_model = smf.ols(
    'congestion ~ bad_news + log_size + roa + C(fiscal_year)',
    data=annual_filings.assign(
        log_size=np.log(annual_filings['total_assets']),
        roa=annual_filings['net_income'] / annual_filings['total_assets']
    )
).fit(cov_type='cluster', cov_kwds={'groups': annual_filings['ticker']})

print("Congestion Regression:")
print(congestion_model.summary().tables[1])

30.6.3 After-Hours and Weekend Announcements

Vietnamese regulations require disclosure within 24 hours of material events, but firms retain discretion over the exact timing. Announcements made after the trading session closes (after 3:00 PM on HOSE/HNX) or on weekends delay the market’s opportunity to react by at least one trading day.

# Assume announcement timestamps are available
annual_filings['ann_hour'] = (
    annual_filings['announcement_date'].dt.hour
)
annual_filings['after_hours'] = (
    (annual_filings['ann_hour'] >= 15) |
    (annual_filings['announcement_dow'] >= 5)  # Saturday/Sunday
).astype(int)

# Cross-tabulate after-hours by news type
afterhours_crosstab = pd.crosstab(
    annual_filings['after_hours'].map({0: 'During Hours', 1: 'After Hours'}),
    annual_filings['bad_news'].map({0: 'Good News', 1: 'Bad News'}),
    normalize='index'
)
print("News Distribution by Announcement Timing:")
print(afterhours_crosstab.round(3))

# Chi-squared test
contingency = pd.crosstab(
    annual_filings['after_hours'], annual_filings['bad_news']
)
chi2, p_val, _, _ = stats.chi2_contingency(contingency)
print(f"\nChi-squared = {chi2:.2f}, p-value = {p_val:.4f}")

30.7 Market Consequences of Disclosure Quality

30.7.1 Disclosure Quality and the Cost of Equity

The central prediction of Diamond and Verrecchia (1991) and Botosan (1997) is that higher-quality disclosure lowers the cost of equity capital by reducing information asymmetry. We test this using the implied cost of capital (ICC) approach, where we estimate the discount rate that equates the current price to the present value of expected future earnings.

We use the PEG ratio approach as a simple ICC estimate:

\[ r_{PEG,i,t} = \sqrt{\frac{\hat{E}_{i,t+2} - \hat{E}_{i,t+1}}{P_{i,t}}} \tag{30.6}\]

where $\hat{E}_{i,t+k}$ is the consensus earnings forecast (or, in the absence of analyst coverage, a model-based forecast) and $P_{i,t}$ is the current stock price.

# Construct earnings forecasts using a simple random walk with drift
forecasts = financials.sort_values(['ticker', 'fiscal_year']).copy()
forecasts['eps'] = forecasts['net_income'] / forecasts['market_cap']
forecasts['eps_growth'] = forecasts.groupby('ticker')['eps'].pct_change()

# Simple forecast: E[t+1] = E[t] * (1 + avg_growth)
forecasts['avg_growth'] = (
    forecasts.groupby('ticker')['eps_growth']
    .transform(lambda x: x.rolling(3, min_periods=2).mean())
)
forecasts['eps_f1'] = forecasts['eps'] * (1 + forecasts['avg_growth'])
forecasts['eps_f2'] = forecasts['eps_f1'] * (1 + forecasts['avg_growth'])

# PEG-based ICC
forecasts['icc_peg'] = np.sqrt(
    np.maximum(forecasts['eps_f2'] - forecasts['eps_f1'], 0)
    / np.maximum(forecasts['market_cap'] / 1e6, 1e-6)
)

# Merge with disclosure quality
icc_panel = (
    forecasts[['ticker', 'fiscal_year', 'icc_peg']]
    .merge(quality_panel[['ticker', 'fiscal_year', 'dq_index']].reset_index(drop=True),
           on=['ticker', 'fiscal_year'], how='inner')
    .merge(governance, on=['ticker', 'fiscal_year'], how='left')
    .merge(financials[['ticker', 'fiscal_year', 'total_assets',
                        'book_to_market', 'market_cap']],
           on=['ticker', 'fiscal_year'], how='left')
)

icc_panel['log_size'] = np.log(icc_panel['market_cap'])
icc_panel = icc_panel.set_index(['ticker', 'fiscal_year'])

# Panel regression: ICC ~ DQ + controls
icc_model = PanelOLS(
    dependent=icc_panel['icc_peg'],
    exog=sm.add_constant(icc_panel[[
        'dq_index', 'log_size', 'book_to_market'
    ]]),
    entity_effects=True,
    time_effects=True,
    check_rank=False
).fit(cov_type='clustered', cluster_entity=True)

print("Implied Cost of Capital ~ Disclosure Quality:")
print(icc_model.summary)

30.7.2 Disclosure Quality and Liquidity

Diamond and Verrecchia (1991) predict that better disclosure reduces adverse selection and improves liquidity. We measure liquidity through bid-ask spreads and the Amihud illiquidity ratio:

\[ \text{Amihud}_{i,t} = \frac{1}{D_{i,t}} \sum_{d=1}^{D_{i,t}} \frac{|R_{i,d}|}{\text{Volume}_{i,d}} \tag{30.7}\]

where $R_{i,d}$ is the daily return and $\text{Volume}_{i,d}$ is the daily trading volume in VND.

# Compute annual Amihud illiquidity
trading['abs_return'] = trading['close'].pct_change().abs()
trading['amihud_daily'] = trading['abs_return'] / (trading['volume'] * trading['close'])

amihud_annual = (
    trading
    .assign(fiscal_year=trading['date'].dt.year)
    .groupby(['ticker', 'fiscal_year'])
    .agg(
        amihud=('amihud_daily', 'mean'),
        avg_spread=('bid_ask_spread', 'mean'),
        avg_turnover=('turnover', 'mean')
    )
    .reset_index()
)

# Log transform for better distributional properties
amihud_annual['log_amihud'] = np.log(amihud_annual['amihud'] + 1e-10)
amihud_annual['log_spread'] = np.log(amihud_annual['avg_spread'] + 1e-6)

# Merge and run regression
liq_panel = (
    amihud_annual
    .merge(quality_panel[['ticker', 'fiscal_year', 'dq_index']].reset_index(drop=True),
           on=['ticker', 'fiscal_year'], how='inner')
    .merge(financials[['ticker', 'fiscal_year', 'market_cap', 'total_assets']],
           on=['ticker', 'fiscal_year'], how='left')
)
liq_panel['log_size'] = np.log(liq_panel['market_cap'])
liq_panel = liq_panel.set_index(['ticker', 'fiscal_year'])

liq_model = PanelOLS(
    dependent=liq_panel['log_amihud'],
    exog=sm.add_constant(liq_panel[['dq_index', 'log_size']]),
    entity_effects=True,
    time_effects=True,
    check_rank=False
).fit(cov_type='clustered', cluster_entity=True)

print("Amihud Illiquidity ~ Disclosure Quality:")
print(liq_model.summary)

liq_panel_plot = liq_panel.reset_index()
liq_panel_plot['dq_quintile'] = pd.qcut(
    liq_panel_plot['dq_index'], 5, labels=['Q1\n(Low)', 'Q2', 'Q3', 'Q4', 'Q5\n(High)']
)

quintile_liq = (
    liq_panel_plot
    .groupby('dq_quintile')['log_amihud']
    .agg(['mean', 'sem'])
)

fig, ax = plt.subplots(figsize=(8, 5))
bars = ax.bar(
    range(5), quintile_liq['mean'],
    yerr=1.96 * quintile_liq['sem'],
    color=['#C0392B', '#E67E22', '#F1C40F', '#27AE60', '#2C5F8A'],
    alpha=0.85, capsize=4, edgecolor='white'
)
ax.set_xticks(range(5))
ax.set_xticklabels(quintile_liq.index)
ax.set_xlabel('Disclosure Quality Quintile')
ax.set_ylabel('Log Amihud Illiquidity')
ax.set_title('Disclosure Quality and Market Liquidity')
plt.tight_layout()
plt.show()

Figure 30.5

30.7.3 Event Study: Market Reaction to Filing Lag

We examine whether the market reacts differently to early vs. late filers by computing cumulative abnormal returns (CARs) around the filing date:

\[ CAR_{i}[\tau_1, \tau_2] = \sum_{t=\tau_1}^{\tau_2} (R_{i,t} - \hat{R}_{i,t}) \tag{30.8}\]

where $\hat{R}_{i,t}$ is the expected return from a market model estimated over a pre-event window $[-250, -30]$.

def compute_car(ticker, event_date, trading_df,
                est_window=(-250, -30), event_window=(-5, 10)):
    """Compute CAR around an event date using market model."""
    firm_data = trading_df[trading_df['ticker'] == ticker].copy()
    firm_data = firm_data.sort_values('date')

    # Find event date index
    event_idx = firm_data[firm_data['date'] >= event_date].index
    if len(event_idx) == 0:
        return None
    event_idx = event_idx[0]
    event_pos = firm_data.index.get_loc(event_idx)

    # Check sufficient data
    if event_pos + est_window[0] < 0:
        return None

    # Estimation window
    est_start = event_pos + est_window[0]
    est_end = event_pos + est_window[1]
    est_data = firm_data.iloc[est_start:est_end + 1]

    firm_ret = est_data['close'].pct_change()
    mkt_ret = est_data['market_return']

    valid = firm_ret.notna() & mkt_ret.notna()
    if valid.sum() < 100:
        return None

    # Market model
    X = sm.add_constant(mkt_ret[valid])
    model = sm.OLS(firm_ret[valid], X).fit()

    # Event window
    ev_start = event_pos + event_window[0]
    ev_end = event_pos + event_window[1]
    ev_data = firm_data.iloc[ev_start:ev_end + 1]

    ev_ret = ev_data['close'].pct_change()
    ev_mkt = ev_data['market_return']
    expected_ret = model.params['const'] + model.params['market_return'] * ev_mkt
    abnormal_ret = ev_ret - expected_ret

    return abnormal_ret.cumsum().values

# Sample: compute CARs for annual filings
car_results = []
for _, row in annual_filings.sample(min(2000, len(annual_filings))).iterrows():
    car = compute_car(row['ticker'], row['filing_date'], trading)
    if car is not None and len(car) == 16:  # -5 to +10
        car_results.append({
            'ticker': row['ticker'],
            'fiscal_year': row['fiscal_year'],
            'lag_tercile': row['lag_tercile'],
            'car': car
        })

car_df = pd.DataFrame(car_results)
print(f"Computed CARs for {len(car_df)} firm-year events")

event_days = range(-5, 11)

fig, ax = plt.subplots(figsize=(10, 6))
colors = {'Early': '#27AE60', 'Middle': '#F1C40F', 'Late': '#C0392B'}

for tercile in ['Early', 'Middle', 'Late']:
    subset = car_df[car_df['lag_tercile'] == tercile]
    if len(subset) > 0:
        avg_car = np.mean(np.stack(subset['car'].values), axis=0)
        se_car = np.std(np.stack(subset['car'].values), axis=0) / np.sqrt(len(subset))
        ax.plot(event_days, avg_car, color=colors[tercile],
                linewidth=2, label=tercile)
        ax.fill_between(event_days,
                        avg_car - 1.96 * se_car,
                        avg_car + 1.96 * se_car,
                        color=colors[tercile], alpha=0.15)

ax.axvline(x=0, color='gray', linestyle='--', linewidth=0.8)
ax.axhline(y=0, color='gray', linewidth=0.5)
ax.set_xlabel('Event Day (Relative to Filing Date)')
ax.set_ylabel('Cumulative Abnormal Return')
ax.set_title('Market Reaction Around Filing Date by Timeliness')
ax.legend(title='Filing Tercile')
plt.tight_layout()
plt.show()

Figure 30.6

30.8 Filing Timeliness and Earnings Quality

Givoly and Palmon (1982) and Chambers and Penman (1984) establish that the content of disclosed information is correlated with its timing. We test this link formally: do late filers have worse earnings quality?

tq_panel = (
    annual_filings[['ticker', 'fiscal_year', 'lag_tercile', 'reporting_lag']]
    .merge(aq_df, on=['ticker', 'fiscal_year'], how='inner')
    .merge(persistence_df[['ticker', 'persistence']], on='ticker', how='left')
)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Panel A: Accruals quality by tercile
aq_by_tercile = tq_panel.groupby('lag_tercile')['accruals_quality'].mean()
axes[0].bar(
    range(3), aq_by_tercile.values,
    color=['#27AE60', '#F1C40F', '#C0392B'], alpha=0.85,
    edgecolor='white'
)
axes[0].set_xticks(range(3))
axes[0].set_xticklabels(['Early', 'Middle', 'Late'])
axes[0].set_ylabel('Accruals Quality (σ of DD Residuals)')
axes[0].set_title('Panel A: Accruals Quality by Filing Tercile')
axes[0].text(0.05, 0.95, 'Higher = lower quality',
             transform=axes[0].transAxes, fontsize=9,
             verticalalignment='top', style='italic', color='gray')

# Panel B: Persistence by tercile
per_by_tercile = tq_panel.groupby('lag_tercile')['persistence'].mean()
axes[1].bar(
    range(3), per_by_tercile.values,
    color=['#27AE60', '#F1C40F', '#C0392B'], alpha=0.85,
    edgecolor='white'
)
axes[1].set_xticks(range(3))
axes[1].set_xticklabels(['Early', 'Middle', 'Late'])
axes[1].set_ylabel('Earnings Persistence (AR(1) Coefficient)')
axes[1].set_title('Panel B: Earnings Persistence by Filing Tercile')

plt.tight_layout()
plt.show()

Figure 30.7

We formalize this with a regression that controls for firm characteristics:

tq_panel_reg = tq_panel.merge(
    financials[['ticker', 'fiscal_year', 'total_assets', 'net_income',
                'total_equity']],
    on=['ticker', 'fiscal_year'], how='left'
).merge(governance, on=['ticker', 'fiscal_year'], how='left')

tq_panel_reg['log_size'] = np.log(tq_panel_reg['total_assets'])
tq_panel_reg['roa'] = tq_panel_reg['net_income'] / tq_panel_reg['total_assets']
tq_panel_reg['late'] = (tq_panel_reg['lag_tercile'] == 'Late').astype(int)

model_tq = smf.ols(
    'accruals_quality ~ late + log_size + roa + state_ownership_pct '
    '+ big4_auditor + C(fiscal_year)',
    data=tq_panel_reg
).fit(cov_type='cluster', cov_kwds={'groups': tq_panel_reg['ticker']})

print("Accruals Quality ~ Late Filing:")
print(model_tq.summary().tables[1])

Endogeneity Caveat

The association between filing timeliness and earnings quality is likely endogenous: firms with complex accounting issues take longer to prepare financial statements, and the same complexity drives lower earnings quality. The filing lag is thus best interpreted as an observable signal of underlying accounting difficulty rather than a causal determinant. Instrumental variable approaches (e.g., using auditor busyness during peak filing season as an instrument for filing lag) can partially address this concern.

30.9 Disclosure Quality and Investment Efficiency

Biddle, Hilary, and Verdi (2009) demonstrate that higher financial reporting quality is associated with more efficient investment. Specifically, it reduces both over-investment (in firms with excess cash) and under-investment (in firms that are financially constrained). The mechanism is that better disclosure reduces information asymmetry between managers and capital providers, improving the allocation of capital.

We test this prediction in Vietnam using the Biddle, Hilary, and Verdi (2009) framework:

\[ \text{Investment}_{i,t+1} = \alpha + \beta_1 \text{SalesGrowth}_{i,t} + \varepsilon_{i,t+1} \tag{30.9}\]

The residual $\hat{\varepsilon}_{i,t+1}$ measures deviation from expected investment. Positive residuals indicate over-investment; negative residuals indicate under-investment. We then test whether the absolute value of this residual is lower for firms with higher disclosure quality.

inv_panel = financials.sort_values(['ticker', 'fiscal_year']).copy()

# Investment = change in total assets / lagged total assets
inv_panel['investment'] = (
    inv_panel.groupby('ticker')['total_assets'].pct_change()
)
inv_panel['sales_growth'] = (
    inv_panel.groupby('ticker')['revenue'].pct_change()
)

# Expected investment model
inv_model = smf.ols(
    'investment ~ sales_growth',
    data=inv_panel
).fit()
inv_panel['inv_residual'] = inv_model.resid
inv_panel['abs_inv_residual'] = inv_panel['inv_residual'].abs()

# Merge with disclosure quality
inv_eff = (
    inv_panel[['ticker', 'fiscal_year', 'abs_inv_residual',
               'investment', 'total_assets']]
    .merge(quality_panel[['ticker', 'fiscal_year', 'dq_index']].reset_index(drop=True),
           on=['ticker', 'fiscal_year'], how='inner')
)
inv_eff['log_size'] = np.log(inv_eff['total_assets'])
inv_eff = inv_eff.set_index(['ticker', 'fiscal_year'])

# Panel regression
inv_eff_model = PanelOLS(
    dependent=inv_eff['abs_inv_residual'],
    exog=sm.add_constant(inv_eff[['dq_index', 'log_size']]),
    entity_effects=True,
    time_effects=True,
    check_rank=False
).fit(cov_type='clustered', cluster_entity=True)

print("Investment Inefficiency ~ Disclosure Quality:")
print(inv_eff_model.summary)

A negative coefficient on dq_index indicates that higher disclosure quality is associated with lower investment inefficiency: firms with better disclosure make investment decisions closer to what their growth opportunities warrant.

30.10 Vietnamese Institutional Context

30.10.1 State Ownership and Disclosure

SOEs account for a substantial share of Vietnamese market capitalization. The relationship between state ownership and disclosure quality is theoretically ambiguous. On one hand, political connections may reduce the pressure to disclose transparently; government shareholders may tolerate opacity that private shareholders would not. On the other hand, post-equitization monitoring by multiple stakeholders (MOF, SCIC, minority shareholders) may create competing disclosure demands.

soe_panel = (
    quality_panel[['ticker', 'fiscal_year', 'dq_index',
                    'reporting_lag']].reset_index(drop=True)
    .merge(governance[['ticker', 'fiscal_year', 'state_ownership_pct']],
           on=['ticker', 'fiscal_year'], how='inner')
)

soe_panel['soe'] = (soe_panel['state_ownership_pct'] >= 50).astype(int)
soe_panel['soe_label'] = soe_panel['soe'].map(
    {1: 'SOE (≥50%)', 0: 'Private (<50%)'}
)

# Compare means
comparison = (
    soe_panel
    .groupby('soe_label')
    .agg(
        n=('dq_index', 'count'),
        mean_dq=('dq_index', 'mean'),
        median_dq=('dq_index', 'median'),
        mean_lag=('reporting_lag', 'mean'),
        median_lag=('reporting_lag', 'median')
    )
    .round(3)
)
print("SOE vs Private Firm Disclosure Comparison:")
print(comparison)

# Formal t-test
soe_dq = soe_panel[soe_panel['soe'] == 1]['dq_index']
priv_dq = soe_panel[soe_panel['soe'] == 0]['dq_index']
t_stat, p_val = stats.ttest_ind(soe_dq.dropna(), priv_dq.dropna())
print(f"\nt-test: t = {t_stat:.3f}, p = {p_val:.4f}")

30.10.2 IFRS Convergence and Disclosure Quality

Vietnam has been pursuing a phased convergence toward IFRS, with the Ministry of Finance issuing a roadmap for voluntary adoption by large listed firms. The transition from VAS to IFRS-aligned standards is expected to expand disclosure requirements—particularly for financial instruments (IFRS 9), revenue recognition (IFRS 15), and leases (IFRS 16). Barth, Landsman, and Lang (2008) provide evidence that IFRS adoption is associated with improvements in earnings quality and disclosure, though the effect depends on enforcement strength.

We can exploit the staggered timing of voluntary IFRS adoption across Vietnamese firms as a natural experiment:

# Assume DataCore provides IFRS adoption dates
ifrs_adoption = client.get_ifrs_adoption(
    exchanges=['HOSE', 'HNX'],
    fields=['ticker', 'ifrs_adoption_year']
)

# Merge with quality panel
ifrs_panel = (
    quality_panel[['ticker', 'fiscal_year', 'dq_index']].reset_index(drop=True)
    .merge(ifrs_adoption, on='ticker', how='left')
)

# Treatment indicator
ifrs_panel['post_ifrs'] = (
    ifrs_panel['fiscal_year'] >= ifrs_panel['ifrs_adoption_year']
).astype(int).fillna(0)

ifrs_panel['treated'] = ifrs_panel['ifrs_adoption_year'].notna().astype(int)

# Simple DiD
ifrs_panel = ifrs_panel.set_index(['ticker', 'fiscal_year'])
did_model = PanelOLS(
    dependent=ifrs_panel['dq_index'],
    exog=sm.add_constant(ifrs_panel[['post_ifrs']]),
    entity_effects=True,
    time_effects=True,
    check_rank=False
).fit(cov_type='clustered', cluster_entity=True)

print("DiD: IFRS Adoption and Disclosure Quality:")
print(did_model.summary)

Identification Concern

Voluntary IFRS adoption is endogenous because firms that choose to adopt early may already have higher-quality disclosure. The two-way fixed effects DiD absorbs time-invariant firm characteristics and common time trends, but cannot fully address selection on time-varying unobservables. Researchers should consider matching estimators (e.g., propensity score matching on pre-adoption characteristics) or instrumental variable approaches as robustness checks.

30.11 Predicting Late Filings

Can we predict which firms will file late? This is valuable for portfolio construction (avoiding potential bad-news firms) and for regulators (targeting enforcement resources). We use a logistic model with financial and governance predictors:

\[ \Pr(\text{Late}_{i,t} = 1) = \Lambda\left(\alpha + \boldsymbol{\beta}'\mathbf{X}_{i,t-1}\right) \tag{30.10}\]

where $\Lambda(\cdot)$ is the logistic function and $\mathbf{X}_{i,t-1}$ are lagged predictors.

from sklearn.metrics import roc_auc_score, classification_report
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression

pred_panel = (
    annual_filings[['ticker', 'fiscal_year', 'late_filing']]
    .merge(financials, on=['ticker', 'fiscal_year'], how='left')
    .merge(governance, on=['ticker', 'fiscal_year'], how='left')
)

# Lagged predictors
pred_panel = pred_panel.sort_values(['ticker', 'fiscal_year'])
for col in ['total_assets', 'net_income', 'operating_cash_flow',
            'total_equity', 'revenue']:
    pred_panel[f'{col}_lag'] = pred_panel.groupby('ticker')[col].shift(1)

pred_panel['log_size_lag'] = np.log(pred_panel['total_assets_lag'])
pred_panel['roa_lag'] = (
    pred_panel['net_income_lag'] / pred_panel['total_assets_lag']
)
pred_panel['leverage_lag'] = (
    (pred_panel['total_assets_lag'] - pred_panel['total_equity_lag'])
    / pred_panel['total_assets_lag']
)
pred_panel['cfo_ratio_lag'] = (
    pred_panel['operating_cash_flow_lag'] / pred_panel['total_assets_lag']
)

# Previous late filing indicator
pred_panel['prev_late'] = (
    pred_panel.groupby('ticker')['late_filing'].shift(1)
)

features = [
    'log_size_lag', 'roa_lag', 'leverage_lag', 'cfo_ratio_lag',
    'state_ownership_pct', 'foreign_ownership_pct',
    'big4_auditor', 'board_independence_pct', 'prev_late'
]

clean = pred_panel.dropna(subset=features + ['late_filing'])
X = clean[features]
y = clean['late_filing']

# Standardize
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Logistic regression with cross-validation
lr = LogisticRegression(max_iter=1000, penalty='l2', C=1.0)
cv_scores = cross_val_score(lr, X_scaled, y, cv=5, scoring='roc_auc')

print(f"5-Fold Cross-Validated AUC: {cv_scores.mean():.3f} ± {cv_scores.std():.3f}")

# Fit on full sample for coefficient interpretation
lr.fit(X_scaled, y)
coef_df = pd.DataFrame({
    'Feature': features,
    'Coefficient': lr.coef_[0],
    'Odds Ratio': np.exp(lr.coef_[0])
}).sort_values('Coefficient', ascending=False)

print("\nLogistic Regression Coefficients:")
print(coef_df.to_string(index=False))

from sklearn.metrics import roc_curve, auc

lr.fit(X_scaled, y)
y_prob = lr.predict_proba(X_scaled)[:, 1]
fpr, tpr, _ = roc_curve(y, y_prob)
roc_auc = auc(fpr, tpr)

fig, ax = plt.subplots(figsize=(7, 7))
ax.plot(fpr, tpr, color='#2C5F8A', linewidth=2,
        label=f'Logistic Model (AUC = {roc_auc:.3f})')
ax.plot([0, 1], [0, 1], color='gray', linestyle='--', linewidth=1)
ax.set_xlabel('False Positive Rate')
ax.set_ylabel('True Positive Rate')
ax.set_title('Late Filing Prediction: ROC Curve')
ax.legend(loc='lower right')
ax.set_aspect('equal')
plt.tight_layout()
plt.show()

Figure 30.8

30.12 Summary

This chapter has examined corporate disclosure quality and timing in Vietnam along several dimensions. The key findings and methodological contributions are in Table 30.3

Table 30.3: Summary of findings by theme.

Theme	Key Result	Reference
Good news early	Early filers earn positive CARs around filing dates	Givoly and Palmon (1982)
Textual quality	Forward-looking density and numerical specificity vary substantially	Li (2008)
Composite DQ index	Foreign ownership and Big 4 auditors are strongest determinants	Botosan (1997)
Cost of capital	Higher DQ is associated with lower implied cost of equity	Diamond and Verrecchia (1991)
Liquidity	Higher DQ firms have lower Amihud illiquidity	Lang, Lins, and Maffett (2012)
Investment efficiency	Higher DQ reduces absolute investment residuals	Biddle, Hilary, and Verdi (2009)
Strategic timing	Evidence of bad-news clustering on high-congestion days	Hirshleifer, Lim, and Teoh (2009)
IFRS adoption	Preliminary evidence of DQ improvement post-adoption	Barth, Landsman, and Lang (2008)

The Vietnamese disclosure environment is shaped by a combination of regulatory mandates (Circular 155, Securities Law 2019), enforcement capacity (SSC penalties and trading suspensions), and firm-level incentives (ownership structure, auditor choice, governance quality). As Vietnam continues its IFRS convergence and capital market development, the information environment is expected to evolve, creating opportunities for researchers to study the dynamics of disclosure quality in a rapidly changing institutional setting.

Ball, Ray, Ashok Robin, and Joanna Shuang Wu. 2003. “Incentives Versus Standards: Properties of Accounting Income in Four East Asian Countries.” Journal of Accounting and Economics 36 (1-3): 235–70.

Barth, Mary E, Wayne R Landsman, and Mark H Lang. 2008. “International Accounting Standards and Accounting Quality.” Journal of Accounting Research 46 (3): 467–98.

Beyer, Anne, Daniel A Cohen, Thomas Z Lys, and Beverly R Walther. 2010. “The Financial Reporting Environment: Review of the Recent Literature.” Journal of Accounting and Economics 50 (2-3): 296–343.

Biddle, Gary C, Gilles Hilary, and Rodrigo S Verdi. 2009. “How Does Financial Reporting Quality Relate to Investment Efficiency?” Journal of Accounting and Economics 48 (2-3): 112–31.

Botosan, Christine A. 1997. “Disclosure Level and the Cost of Equity Capital.” Accounting Review, 323–49.

Botosan, Christine A, and Marlene A Plumlee. 2002. “A Re-Examination of Disclosure Level and the Expected Cost of Equity Capital.” Journal of Accounting Research 40 (1): 21–40.

Bushman, Robert, Qi Chen, Ellen Engel, and Abbie Smith. 2004. “Financial Accounting Information, Organizational Complexity and Corporate Governance Systems.” Journal of Accounting and Economics 37 (2): 167–201.

Chambers, Anne E, and Stephen H Penman. 1984. “Timeliness of Reporting and the Stock Price Reaction to Earnings Announcements.” Journal of Accounting Research, 21–47.

Dechow, Patricia, Weili Ge, and Catherine Schrand. 2010. “Understanding Earnings Quality: A Review of the Proxies, Their Determinants and Their Consequences.” Journal of Accounting and Economics 50 (2-3): 344–401.

DellaVigna, Stefano, and Joshua M Pollet. 2009. “Investor Inattention and Friday Earnings Announcements.” The Journal of Finance 64 (2): 709–49.

Diamond, Douglas W. 1985. “Optimal Release of Information by Firms.” The Journal of Finance 40 (4): 1071–94.

Diamond, Douglas W, and Robert E Verrecchia. 1991. “Disclosure, Liquidity, and the Cost of Capital.” The Journal of Finance 46 (4): 1325–59.

Dyer, Travis, Mark Lang, and Lorien Stice-Lawrence. 2017. “The Evolution of 10-k Textual Disclosure: Evidence from Latent Dirichlet Allocation.” Journal of Accounting and Economics 64 (2-3): 221–45.

Francis, Jennifer, Ryan LaFond, Per Olsson, and Katherine Schipper. 2005. “The Market Pricing of Accruals Quality.” Journal of Accounting and Economics 39 (2): 295–327.

Givoly, Dan, and Dan Palmon. 1982. “Timeliness of Annual Earnings Announcements: Some Empirical Evidence.” Accounting Review, 486–508.

Guay, Wayne, Delphine Samuels, and Daniel Taylor. 2016. “Guiding Through the Fog: Financial Statement Complexity and Voluntary Disclosure.” Journal of Accounting and Economics 62 (2-3): 234–69.

Healy, Paul M, and Krishna G Palepu. 2001. “Information Asymmetry, Corporate Disclosure, and the Capital Markets: A Review of the Empirical Disclosure Literature.” Journal of Accounting and Economics 31 (1-3): 405–40.

Hirshleifer, David, Sonya Seongyeon Lim, and Siew Hong Teoh. 2009. “Driven to Distraction: Extraneous Events and Underreaction to Earnings News.” The Journal of Finance 64 (5): 2289–2325.

Hope, Ole-Kristian. 2003. “Disclosure Practices, Enforcement of Accounting Standards, and Analysts’ Forecast Accuracy: An International Study.” Journal of Accounting Research 41 (2): 235–72.

Lang, Mark, Karl V Lins, and Mark Maffett. 2012. “Transparency, Liquidity, and Valuation: International Evidence on When Transparency Matters Most.” Journal of Accounting Research 50 (3): 729–74.

Lang, Mark, and Russell Lundholm. 1993. “Cross-Sectional Determinants of Analyst Ratings of Corporate Disclosures.” Journal of Accounting Research 31 (2): 246–71.

Leuz, Christian, Dhananjay Nanda, and Peter D Wysocki. 2003. “Earnings Management and Investor Protection: An International Comparison.” Journal of Financial Economics 69 (3): 505–27.

Li, Feng. 2008. “Annual Report Readability, Current Earnings, and Earnings Persistence.” Journal of Accounting and Economics 45 (2-3): 221–47.

Loughran, Tim, and Bill McDonald. 2014. “Measuring Readability in Financial Disclosures.” The Journal of Finance 69 (4): 1643–71.

Patell, James M, and Mark A Wolfson. 1982. “Good News, Bad News, and the Intraday Timing of Corporate Disclosures.” Accounting Review, 509–27.

Sengupta, Partha. 1998. “Corporate Disclosure Quality and the Cost of Debt.” Accounting Review, 459–74.

Verrecchia, Robert E. 1983. “Discretionary Disclosure.” Journal of Accounting and Economics 5: 179–94.

# Disclosure Quality and Timing Corporate disclosure is the primary mechanism through which firms communicate with capital markets. The quality, quantity, and timing of disclosures shape the information environment in which investors form expectations, price securities, and allocate capital. A large theoretical and empirical literature, surveyed by @healy2001information and @beyer2010financial, demonstrates that disclosure decisions have first-order effects on the cost of capital, liquidity, and investment efficiency. This chapter brings two decades of disclosure research to the Vietnamese market, where several institutional features create a distinctive setting. First, Vietnam's regulatory framework, anchored by Circular 155/2015/TT-BTC (amended by Circular 96/2020/TT-BTC) and enforced by the State Securities Commission (SSC), mandates periodic and event-driven disclosures with specific deadlines that differ from U.S. and European norms. Second, the dominance of retail investors and relatively thin analyst coverage means that corporate disclosures are often the *primary* source of firm-specific information, amplifying their economic importance. Third, the ongoing transition from Vietnamese Accounting Standards (VAS) toward IFRS convergence introduces time-varying changes in disclosure requirements that create natural variation for empirical analysis. ## Theoretical Foundations {#sec-dis-qual-theory} ### Voluntary Disclosure Theory The foundational model of voluntary disclosure is due to @verrecchia1983discretionary, who shows that in a setting where investors know a manager possesses private information, an *unraveling* equilibrium emerges: silence is interpreted as bad news, so managers disclose unless the proprietary cost of disclosure exceeds its benefit. The key insight is that non-disclosure is informative because investors rationally infer that withheld information is unfavourable. @diamond1985optimal extends the analysis to a multi-period setting where the firm's disclosure policy affects the precision of public information and hence the incentives for private information acquisition. The central trade-off is between reducing information asymmetry (which lowers the cost of capital) and reducing the rents that informed traders earn (which may discourage monitoring). @diamond1991disclosure formalize the link between disclosure and liquidity: by reducing adverse selection, voluntary disclosure narrows bid-ask spreads and increases the willingness of uninformed investors to trade. The empirical prediction is that firms with higher-quality disclosure should enjoy: 1. Lower cost of equity capital [@botosan1997disclosure; @botosan2002re] 2. Lower cost of debt [@sengupta1998corporate] 3. Higher liquidity and lower bid-ask spreads [@diamond1991disclosure; @lang2012transparency] 4. More efficient investment decisions [@biddle2009does] ### Strategic Disclosure Timing Not all disclosure is voluntary in timing, but managers retain discretion over *when,* within permissible windows, to release information. @patell1982good document that firms tend to release good news during trading hours and bad news after market close. @dellavigna2009investor show that earnings announced on Fridays (i.e., when investor attention is lower) generate smaller immediate reactions and larger post-announcement drift, consistent with limited attention. @hirshleifer2009driven generalize this finding: extraneous events that distract investors (such as a large number of concurrent announcements) reduce the immediate price response to earnings news. In Vietnam, several features make strategic timing particularly relevant. The concentrated disclosure calendar, where many firms file near regulatory deadlines, creates natural variation in announcement congestion. The retail-dominated investor base may be more susceptible to attention effects than institutional investors. The regulatory structure, which imposes penalties for late filing but allows discretion within the permissible window, creates a setting in which the *choice* of filing date is informative. ### Disclosure Quality in Emerging Markets @ball2003incentives argue that accounting quality is shaped more by reporting *incentives* than by accounting *standards*. In institutional environments with weak enforcement, concentrated ownership, and close alignment between financial and tax reporting, firms may produce lower-quality disclosures even under nominally rigorous standards. @leuz2003earnings confirm this pattern internationally: earnings management (an inverse proxy for disclosure quality) is highest in countries with weak investor protection, concentrated ownership, and less developed capital markets. Vietnam exhibits several of these features. @bushman2004financial classify determinants of transparency into governance factors (legal origin, judicial efficiency, minority protection) and political factors (state ownership, government intervention). Vietnam's civil-law tradition, significant state ownership in listed firms, and evolving enforcement capacity suggest that disclosure quality may be lower on average than in developed markets, but with substantial cross-sectional variation driven by firm-level governance and ownership structures. ## Regulatory Framework {#sec-dis-qual-regulation} ### Mandatory Disclosure Requirements Vietnamese disclosure regulation operates through a hierarchy of legal instruments: - **Securities Law (2019):** Establishes the general obligation of listed firms to disclose information truthfully, accurately, completely, and on time (Article 118). - **Circular 155/2015/TT-BTC** (amended by **Circular 96/2020/TT-BTC**): Specifies the content, format, and deadlines for periodic and event-driven disclosures. - **SSC decisions and guidance:** Provide implementation details and sector-specific requirements. The key periodic reporting deadlines are in @tbl-dis-qual-deadlines | Report Type | Deadline | Audit Requirement | |------------------------|------------------------|------------------------| | Annual financial statements | 90 days after fiscal year-end | Audited | | Semi-annual financial statements | 45 days after period-end | Reviewed | | Quarterly financial statements | 20 days after quarter-end | Unaudited | | Annual report | 110 days after fiscal year-end | N/A (narrative) | : Periodic disclosure deadlines under Vietnamese securities regulation. {#tbl-dis-qual-deadlines} Event-driven (ad hoc) disclosures must be filed within 24 hours for material events, including changes in ownership exceeding 1% by major shareholders, board resolutions on dividends or capital increases, and any event that may materially affect the share price. ### Penalties for Non-Compliance The SSC may impose administrative fines for late or incomplete disclosure, typically ranging from VND 50–100 million for minor violations and up to VND 500 million for material omissions. While these amounts are modest relative to firm size for large-cap companies, the reputational cost and the risk of trading suspension provide additional deterrence. ## Data Construction {#sec-dis-qual-data} ### Loading Required Libraries ```{python} #| label: setup #| code-summary: "Import libraries and configure environment" import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import statsmodels.api as sm import statsmodels.formula.api as smf from datetime import datetime, timedelta from scipy import stats from sklearn.preprocessing import StandardScaler from linearmodels.panel import PanelOLS import warnings warnings.filterwarnings('ignore') # Plotting defaults plt.rcParams.update({ 'figure.figsize': (10, 6), 'figure.dpi': 150, 'font.size': 11, 'axes.spines.top': False, 'axes.spines.right': False }) ``` ### Retrieving Disclosure Data We assume that we have structured data on filing dates, announcement timestamps, and the textual content of corporate disclosures for all firms. ```{python} #| label: data-load #| code-summary: "Load disclosure and financial data" #| eval: false from datacore import DataCoreClient client = DataCoreClient() # Filing metadata: announcement dates, filing dates, report types filings = client.get_filings( exchanges=['HOSE', 'HNX'], report_types=['annual', 'semi_annual', 'quarterly'], start_date='2012-01-01', end_date='2024-12-31', fields=[ 'ticker', 'report_type', 'fiscal_year', 'fiscal_quarter', 'fiscal_year_end', 'filing_date', 'announcement_date', 'auditor', 'audit_opinion', 'file_url' ] ) # Financial statement data financials = client.get_fundamentals( exchanges=['HOSE', 'HNX'], start_date='2012-01-01', end_date='2024-12-31', fields=[ 'ticker', 'fiscal_year', 'fiscal_quarter', 'total_assets', 'total_equity', 'revenue', 'net_income', 'operating_cash_flow', 'total_accruals', 'market_cap', 'book_to_market' ] ) # Daily trading data for event studies trading = client.get_daily_prices( exchanges=['HOSE', 'HNX'], start_date='2012-01-01', end_date='2024-12-31', fields=[ 'ticker', 'date', 'close', 'volume', 'turnover', 'bid_ask_spread', 'market_return' ] ) # Ownership and governance governance = client.get_governance( exchanges=['HOSE', 'HNX'], fields=[ 'ticker', 'fiscal_year', 'state_ownership_pct', 'foreign_ownership_pct', 'board_size', 'board_independence_pct', 'big4_auditor', 'dual_listing' ] ) print(f"Filings: {filings.shape[0]:,} records") print(f"Financials: {financials.shape[0]:,} records") print(f"Trading: {trading.shape[0]:,} records") print(f"Governance: {governance.shape[0]:,} records") ``` ### Computing Filing Timeliness {#sec-dis-qual-timeliness} We define **reporting lag** as the number of calendar days between the fiscal period-end and the date the firm's financial statements are made publicly available. For annual reports, the regulatory maximum is 90 days; firms that file earlier than the deadline reveal information sooner, while firms that file late face potential penalties and signal possible difficulties with their accounts. ```{python} #| label: compute-lag #| code-summary: "Compute reporting lag for each filing" #| eval: false filings['fiscal_year_end'] = pd.to_datetime(filings['fiscal_year_end']) filings['filing_date'] = pd.to_datetime(filings['filing_date']) filings['announcement_date'] = pd.to_datetime(filings['announcement_date']) # Reporting lag = filing date - fiscal period end filings['reporting_lag'] = ( filings['filing_date'] - filings['fiscal_year_end'] ).dt.days # Regulatory deadline based on report type deadline_map = { 'annual': 90, 'semi_annual': 45, 'quarterly': 20 } filings['deadline_days'] = filings['report_type'].map(deadline_map) # Late filing indicator filings['late_filing'] = ( filings['reporting_lag'] > filings['deadline_days'] ).astype(int) # Days relative to deadline (negative = early, positive = late) filings['days_relative_deadline'] = ( filings['reporting_lag'] - filings['deadline_days'] ) # Summary statistics annual_filings = filings[filings['report_type'] == 'annual'].copy() print("Annual Report Filing Lag (calendar days):") print(annual_filings['reporting_lag'].describe().round(1)) print(f"\nLate filing rate: {annual_filings['late_filing'].mean():.1%}") ``` ### Distribution of Filing Lags ```{python} #| label: fig-filing-lag-dist #| eval: false #| fig-cap: "Distribution of annual report filing lags for Vietnamese listed firms (2012–2024). The dashed red line marks the 90-day regulatory deadline. The distribution is right-skewed, with a mass of filings near the deadline and a tail of late filers." #| code-summary: "Plot the distribution of annual report filing lags" fig, axes = plt.subplots(1, 2, figsize=(14, 5)) # Histogram axes[0].hist( annual_filings['reporting_lag'].dropna(), bins=60, range=(20, 150), color='#2C5F8A', edgecolor='white', alpha=0.85 ) axes[0].axvline(x=90, color='#C0392B', linestyle='--', linewidth=2, label='90-day deadline') axes[0].set_xlabel('Reporting Lag (Calendar Days)') axes[0].set_ylabel('Number of Filings') axes[0].set_title('Distribution of Annual Report Filing Lags') axes[0].legend() # Time trend: median lag by year median_lag = ( annual_filings .groupby('fiscal_year')['reporting_lag'] .agg(['median', lambda x: x.quantile(0.25), lambda x: x.quantile(0.75)]) ) median_lag.columns = ['median', 'p25', 'p75'] axes[1].fill_between( median_lag.index, median_lag['p25'], median_lag['p75'], alpha=0.3, color='#2C5F8A', label='IQR' ) axes[1].plot( median_lag.index, median_lag['median'], color='#2C5F8A', linewidth=2, marker='o', label='Median' ) axes[1].axhline(y=90, color='#C0392B', linestyle='--', linewidth=1.5, label='Deadline') axes[1].set_xlabel('Fiscal Year') axes[1].set_ylabel('Filing Lag (Calendar Days)') axes[1].set_title('Median Annual Filing Lag Over Time') axes[1].legend() plt.tight_layout() plt.show() ``` ## Measuring Disclosure Quality {#sec-dis-qual-quality} Disclosure quality is inherently multidimensional. Following @dechow2010understanding and @beyer2010financial, we construct proxies along four dimensions: (i) timeliness, (ii) textual properties, (iii) accounting quality, and (iv) voluntary disclosure breadth. ### Timeliness as a Quality Dimension Timely disclosure reduces the duration of information asymmetry between insiders and outside investors. @chambers1984timeliness and @givoly1982timeliness establish that early reporters tend to announce good news, while late reporters more often deliver bad news. We test this pattern in the Vietnamese context below. We operationalize timeliness through two measures: 1. **Reporting lag** (continuous): Calendar days from fiscal period-end to filing date, as constructed in @sec-dis-qual-timeliness. 2. **Early/late classification** (categorical): We classify firm-years into terciles based on reporting lag within each fiscal year. This controls for secular trends in filing speed (e.g., driven by regulatory changes or COVID-19 disruptions). ```{python} #| label: timeliness-terciles #| code-summary: "Classify filings into timeliness terciles" #| eval: false annual_filings['lag_tercile'] = ( annual_filings .groupby('fiscal_year')['reporting_lag'] .transform(lambda x: pd.qcut(x, 3, labels=['Early', 'Middle', 'Late'])) ) # Tabulate tercile_stats = ( annual_filings .groupby('lag_tercile')['reporting_lag'] .agg(['count', 'mean', 'median', 'std']) .round(1) ) tercile_stats.columns = ['N', 'Mean Lag', 'Median Lag', 'SD'] print(tercile_stats) ``` ### Textual Quality Measures {#sec-dis-qual-textual} The textual properties of corporate disclosures convey information about quality beyond what is captured by accounting numbers alone. @li2008annual demonstrates that annual reports with lower readability are associated with lower earnings persistence, suggesting that complex language may obscure unfavourable information. @loughran2014measuring critique the application of general readability formulas (Fog index, Flesch-Kincaid) to financial text, arguing that these metrics confound complexity with technical terminology. We construct three textual quality measures adapted for Vietnamese corporate disclosures: #### Document Length and Specificity Longer disclosures are not inherently better—length may reflect boilerplate or obfuscation. However, @dyer2017evolution show that the *informative* component of disclosure (as opposed to standard legal language) has increased over time in U.S. 10-K filings. We measure: - **Total word count:** Raw length of the annual report narrative sections (MD&A equivalent) - **Numerical density:** Proportion of tokens that are numbers, percentages, or currency amounts, which is a proxy for specificity. ```{python} #| label: textual-measures #| code-summary: "Compute textual disclosure quality measures" #| eval: false import re from underthesea import word_tokenize def compute_textual_metrics(text): """Compute textual quality metrics for Vietnamese corporate text.""" if not text or len(text.strip()) == 0: return { 'word_count': 0, 'sentence_count': 0, 'numerical_density': 0, 'avg_sentence_length': 0, 'unique_word_ratio': 0, 'forward_looking_density': 0 } # Vietnamese word segmentation tokens = word_tokenize(text) sentences = re.split(r'[.!?。]', text) sentences = [s.strip() for s in sentences if len(s.strip()) > 5] word_count = len(tokens) sentence_count = max(len(sentences), 1) # Numerical density: proportion of tokens that are numeric num_pattern = re.compile(r'^[\d,.%]+$') numeric_tokens = sum(1 for t in tokens if num_pattern.match(t)) numerical_density = numeric_tokens / max(word_count, 1) # Lexical diversity: unique words / total words unique_words = len(set(t.lower() for t in tokens)) unique_word_ratio = unique_words / max(word_count, 1) # Forward-looking statement density forward_keywords = [ 'dự kiến', 'kế hoạch', 'mục tiêu', 'triển vọng', 'định hướng', 'chiến lược', 'tương lai', 'sẽ', 'dự báo', 'phấn đấu', 'cam kết', 'hướng tới' ] text_lower = text.lower() forward_count = sum(text_lower.count(kw) for kw in forward_keywords) forward_looking_density = forward_count / max(sentence_count, 1) return { 'word_count': word_count, 'sentence_count': sentence_count, 'numerical_density': numerical_density, 'avg_sentence_length': word_count / sentence_count, 'unique_word_ratio': unique_word_ratio, 'forward_looking_density': forward_looking_density } # Retrieve annual report text from DataCore annual_text = client.get_annual_report_text( exchanges=['HOSE', 'HNX'], start_date='2012-01-01', end_date='2024-12-31', sections=['mda', 'business_overview', 'risk_factors'] ) # Apply textual metrics textual_metrics = annual_text.apply( lambda row: compute_textual_metrics(row['text']), axis=1, result_type='expand' ) annual_text = pd.concat([annual_text, textual_metrics], axis=1) print("Textual Quality Summary Statistics:") print(annual_text[['word_count', 'numerical_density', 'avg_sentence_length', 'unique_word_ratio', 'forward_looking_density']].describe().round(3)) ``` #### Forward-Looking Statement Density Forward-looking statements reveal management's expectations about future performance and are considered a higher-quality form of disclosure because they expose the manager to ex-post evaluation. In Vietnamese reports, forward-looking language typically appears in the form of phrases like *dự kiến* (expected), *kế hoạch* (plan), *mục tiêu* (target), and *triển vọng* (outlook). @guay2016guiding show that managers use voluntary disclosure to "guide through the fog" when financial statements are complex. We operationalize forward-looking density as the number of forward-looking phrases per sentence, following the keyword approach in our `compute_textual_metrics` function above. ### Accounting-Based Quality Proxies {#sec-dis-qual-accruals-quality} We complement textual measures with accounting-based proxies that capture the reliability of reported financial information. #### Accruals Quality Following @francis2005market, we measure accruals quality as the standard deviation of residuals from a regression of working capital accruals on past, current, and future operating cash flows: $$ \frac{WC_{i,t}}{A_{i,t-1}} = \alpha + \beta_1 \frac{CFO_{i,t-1}}{A_{i,t-1}} + \beta_2 \frac{CFO_{i,t}}{A_{i,t-1}} + \beta_3 \frac{CFO_{i,t+1}}{A_{i,t-1}} + \varepsilon_{i,t} $$ {#eq-accruals-quality} where $WC_{i,t}$ is working capital accruals, $CFO_{i,t}$ is operating cash flow, and $A_{i,t-1}$ is lagged total assets. The firm-level standard deviation of $\hat{\varepsilon}_{i,t}$ over a rolling window (typically 5 years) is the accruals quality measure, with higher values indicating lower quality. ```{python} #| label: accruals-quality #| eval: false #| code-summary: "Estimate accruals quality (Dechow-Dichev residual volatility)" def estimate_accruals_quality(df, min_obs=5): """ Estimate accruals quality as std dev of DD residuals over a rolling 5-year window for each firm. """ results = [] for ticker, group in df.groupby('ticker'): group = group.sort_values('fiscal_year') # Construct leads/lags of CFO group['cfo_lag1'] = group['operating_cash_flow'].shift(1) group['cfo_lead1'] = group['operating_cash_flow'].shift(-1) # Scale by lagged assets group['lag_assets'] = group['total_assets'].shift(1) for col in ['total_accruals', 'operating_cash_flow', 'cfo_lag1', 'cfo_lead1']: group[f'{col}_scaled'] = group[col] / group['lag_assets'] # Rolling 5-year residual std dev for idx in range(len(group)): window = group.iloc[max(0, idx - 4):idx + 1] window = window.dropna(subset=[ 'total_accruals_scaled', 'operating_cash_flow_scaled', 'cfo_lag1_scaled', 'cfo_lead1_scaled' ]) if len(window) >= min_obs: y = window['total_accruals_scaled'] X = sm.add_constant(window[[ 'cfo_lag1_scaled', 'operating_cash_flow_scaled', 'cfo_lead1_scaled' ]]) try: model = sm.OLS(y, X).fit() results.append({ 'ticker': ticker, 'fiscal_year': group.iloc[idx]['fiscal_year'], 'accruals_quality': model.resid.std() }) except Exception: pass return pd.DataFrame(results) aq_df = estimate_accruals_quality(financials) print(f"Accruals quality computed for {aq_df['ticker'].nunique()} firms") print(aq_df['accruals_quality'].describe().round(4)) ``` #### Earnings Persistence and Predictability Persistent earnings are more useful for valuation. We estimate earnings persistence as the slope coefficient $\phi_1$ from a first-order autoregression: $$ \frac{E_{i,t}}{A_{i,t-1}} = \phi_0 + \phi_1 \frac{E_{i,t-1}}{A_{i,t-2}} + \nu_{i,t} $$ {#eq-persistence} Higher $\hat{\phi}_1$ indicates more persistent (and arguably higher-quality) earnings. ```{python} #| label: earnings-persistence #| eval: false #| code-summary: "Estimate firm-level earnings persistence" def estimate_persistence(df, min_obs=5): """Estimate earnings persistence via AR(1) model.""" results = [] for ticker, group in df.groupby('ticker'): group = group.sort_values('fiscal_year') group['earnings_scaled'] = group['net_income'] / group['total_assets'].shift(1) group['earnings_lag'] = group['earnings_scaled'].shift(1) clean = group.dropna(subset=['earnings_scaled', 'earnings_lag']) if len(clean) >= min_obs: y = clean['earnings_scaled'] X = sm.add_constant(clean[['earnings_lag']]) model = sm.OLS(y, X).fit() results.append({ 'ticker': ticker, 'persistence': model.params['earnings_lag'], 'persistence_se': model.bse['earnings_lag'], 'r_squared': model.rsquared, 'n_obs': model.nobs }) return pd.DataFrame(results) persistence_df = estimate_persistence(financials) print(persistence_df[['persistence', 'r_squared']].describe().round(3)) ``` ### Composite Disclosure Quality Index {#sec-dis-qual-composite} Individual quality proxies capture different facets of the information environment. To aggregate them into a single score while avoiding arbitrary weighting, we follow @lang1993cross and use a rank-based composite. For each firm-year, we rank firms on each of the following dimensions (higher rank = higher quality) (@tbl-dis-qual-composite). | Dimension | Proxy | Direction | |------------------|-----------------------|-------------------| | Timeliness | Reporting lag | Lower is better | | Specificity | Numerical density | Higher is better | | Forward-looking | FLS density | Higher is better | | Earnings quality | Accruals quality (DD) | Lower σ is better | | Persistence | AR(1) coefficient | Higher is better | : Components of the composite disclosure quality index. {#tbl-dis-qual-composite} We convert each proxy to a percentile rank within each fiscal year (so each component ranges from 0 to 1), then average across components: $$ DQ_{i,t} = \frac{1}{K} \sum_{k=1}^{K} \text{Rank}_{k,i,t} $$ {#eq-dq-index} where $K$ is the number of available components and $\text{Rank}_{k,i,t}$ is the percentile rank of firm $i$ in year $t$ on dimension $k$. ```{python} #| label: composite-index #| code-summary: "Construct composite disclosure quality index" #| eval: false # Merge all quality proxies quality_panel = ( annual_filings[['ticker', 'fiscal_year', 'reporting_lag']] .merge( annual_text[['ticker', 'fiscal_year', 'numerical_density', 'forward_looking_density']], on=['ticker', 'fiscal_year'], how='left' ) .merge(aq_df, on=['ticker', 'fiscal_year'], how='left') .merge(persistence_df[['ticker', 'persistence']], on='ticker', how='left') ) # Rank each component within fiscal year (higher = better quality) def year_percentile_rank(series): """Convert to percentile rank within group.""" return series.rank(pct=True) rank_cols = {} for col, ascending in [ ('reporting_lag', False), # lower lag = better → invert ('numerical_density', True), # higher = better ('forward_looking_density', True), ('accruals_quality', False), # lower volatility = better → invert ('persistence', True) # higher = better ]: col_to_rank = quality_panel[col] if ascending else -quality_panel[col] rank_cols[f'rank_{col}'] = ( quality_panel .groupby('fiscal_year')[col] .transform(lambda x: x.rank(pct=True) if ascending else (-x).rank(pct=True)) ) rank_df = pd.DataFrame(rank_cols) quality_panel = pd.concat([quality_panel, rank_df], axis=1) # Composite index: average of available ranks rank_columns = [c for c in quality_panel.columns if c.startswith('rank_')] quality_panel['dq_index'] = quality_panel[rank_columns].mean(axis=1) print("Disclosure Quality Index Distribution:") print(quality_panel['dq_index'].describe().round(3)) ``` ```{python} #| label: fig-dq-distribution #| fig-cap: "Distribution of the composite disclosure quality index across Vietnamese listed firms. The index aggregates five dimensions: timeliness, numerical specificity, forward-looking statement density, accruals quality, and earnings persistence. Higher values indicate better disclosure quality." #| eval: false #| code-summary: "Plot distribution of composite disclosure quality index" fig, ax = plt.subplots(figsize=(10, 5)) ax.hist(quality_panel['dq_index'].dropna(), bins=50, color='#2C5F8A', edgecolor='white', alpha=0.85) ax.axvline(quality_panel['dq_index'].median(), color='#E67E22', linestyle='--', linewidth=2, label='Median') ax.set_xlabel('Disclosure Quality Index') ax.set_ylabel('Number of Firm-Years') ax.set_title('Distribution of Composite Disclosure Quality') ax.legend() plt.tight_layout() plt.show() ``` ## Determinants of Disclosure Quality {#sec-dis-qual-determinants} What drives variation in disclosure quality across Vietnamese firms? We estimate a cross-sectional regression of the composite DQ index on firm characteristics and governance variables: $$ DQ_{i,t} = \alpha + \beta_1 \ln(\text{Size}_{i,t}) + \beta_2 \text{ROA}_{i,t} + \beta_3 \text{Lev}_{i,t} + \beta_4 \text{StateOwn}_{i,t} + \beta_5 \text{ForeignOwn}_{i,t} + \beta_6 \text{Big4}_{i,t} + \beta_7 \text{BoardIndep}_{i,t} + \gamma_t + \varepsilon_{i,t} $$ {#eq-determinants} where $\gamma_t$ are year fixed effects. The theoretical predictions, drawing on @lang1993cross, @hope2003disclosure, and @bushman2004financial, are: - **Size (+):** Larger firms face greater public scrutiny and have lower proprietary costs relative to the benefits of disclosure. - **ROA (+/−):** Profitable firms may disclose more to signal quality, but firms managing earnings downward (for tax purposes) may reduce disclosure to avoid scrutiny. - **Leverage (+):** @sengupta1998corporate argues that firms with more debt have stronger incentives to maintain disclosure quality to lower borrowing costs. - **State ownership (−):** SOEs may face weaker market discipline and political incentives to limit transparency. - **Foreign ownership (+):** Foreign institutional investors demand higher transparency. - **Big 4 auditor (+):** High-quality auditors constrain earnings management and indirectly improve disclosure quality. - **Board independence (+):** Independent directors improve monitoring and encourage more informative disclosure. ```{python} #| label: determinants-regression #| code-summary: "Estimate determinants of disclosure quality" #| eval: false # Merge quality index with financials and governance det_panel = ( quality_panel[['ticker', 'fiscal_year', 'dq_index']] .merge(financials, on=['ticker', 'fiscal_year'], how='left') .merge(governance, on=['ticker', 'fiscal_year'], how='left') ) # Construct variables det_panel['log_size'] = np.log(det_panel['total_assets']) det_panel['roa'] = det_panel['net_income'] / det_panel['total_assets'] det_panel['leverage'] = ( (det_panel['total_assets'] - det_panel['total_equity']) / det_panel['total_assets'] ) # Panel regression with year FE det_panel = det_panel.set_index(['ticker', 'fiscal_year']) model_det = PanelOLS( dependent=det_panel['dq_index'], exog=sm.add_constant(det_panel[[ 'log_size', 'roa', 'leverage', 'state_ownership_pct', 'foreign_ownership_pct', 'big4_auditor', 'board_independence_pct' ]]), entity_effects=False, time_effects=True, check_rank=False ).fit(cov_type='clustered', cluster_entity=True) print(model_det.summary) ``` ```{python} #| label: fig-determinants-coefficients #| fig-cap: "Coefficient estimates from regressing the disclosure quality index on firm characteristics. Horizontal bars represent 95% confidence intervals. Foreign ownership and Big 4 auditor engagement are the strongest predictors of disclosure quality." #| code-summary: "Visualize determinant regression coefficients" #| eval: false coefs = model_det.params.drop('const') ci = model_det.conf_int().drop('const') fig, ax = plt.subplots(figsize=(8, 5)) y_pos = range(len(coefs)) labels = [ 'ln(Assets)', 'ROA', 'Leverage', 'State Own %', 'Foreign Own %', 'Big 4 Auditor', 'Board Indep %' ] colors = ['#2C5F8A' if c > 0 else '#C0392B' for c in coefs.values] ax.barh(y_pos, coefs.values, color=colors, alpha=0.8, height=0.6) ax.errorbar( coefs.values, y_pos, xerr=[coefs.values - ci.iloc[:, 0].values, ci.iloc[:, 1].values - coefs.values], fmt='none', color='black', capsize=3 ) ax.axvline(x=0, color='gray', linewidth=0.8, linestyle='-') ax.set_yticks(y_pos) ax.set_yticklabels(labels) ax.set_xlabel('Coefficient Estimate') ax.set_title('Determinants of Disclosure Quality') plt.tight_layout() plt.show() ``` ## Strategic Disclosure Timing {#sec-dis-qual-strategic-timing} ### Day-of-Week Effects @dellavigna2009investor document that Friday earnings announcements receive less immediate market attention. We test whether this pattern holds in Vietnam, where the trading week runs Monday through Friday but the retail-dominated investor base may exhibit different attention patterns. ```{python} #| label: day-of-week #| code-summary: "Analyze day-of-week patterns in earnings announcements" #| eval: false annual_filings['announcement_dow'] = ( annual_filings['announcement_date'].dt.dayofweek ) annual_filings['day_name'] = ( annual_filings['announcement_date'].dt.day_name() ) # Compute surprise: actual earnings minus naive expectation (last year's earnings) annual_filings = annual_filings.merge( financials[['ticker', 'fiscal_year', 'net_income', 'total_assets']], on=['ticker', 'fiscal_year'], how='left' ) annual_filings['earnings_scaled'] = ( annual_filings['net_income'] / annual_filings['total_assets'] ) annual_filings['earnings_surprise'] = ( annual_filings .groupby('ticker')['earnings_scaled'] .diff() ) # Classify as good/bad news annual_filings['bad_news'] = ( annual_filings['earnings_surprise'] < 0 ).astype(int) # Day-of-week distribution by news type dow_crosstab = pd.crosstab( annual_filings['day_name'], annual_filings['bad_news'].map({0: 'Good News', 1: 'Bad News'}), normalize='columns' ) # Reorder days day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'] dow_crosstab = dow_crosstab.reindex(day_order) print("Proportion of Announcements by Day and News Type:") print(dow_crosstab.round(3)) ``` ```{python} #| label: fig-dow-pattern #| fig-cap: "Day-of-week distribution of earnings announcements, split by news type. If firms strategically time bad news to low-attention days, we expect a higher proportion of bad-news announcements on Fridays. In Vietnam, the pattern may differ from U.S. evidence due to the retail-dominated investor base." #| eval: false #| code-summary: "Plot day-of-week announcement patterns" fig, ax = plt.subplots(figsize=(10, 5)) x = np.arange(len(day_order)) width = 0.35 bad_pct = dow_crosstab['Bad News'].values good_pct = dow_crosstab['Good News'].values ax.bar(x - width/2, good_pct, width, label='Good News', color='#27AE60', alpha=0.8) ax.bar(x + width/2, bad_pct, width, label='Bad News', color='#C0392B', alpha=0.8) ax.set_xticks(x) ax.set_xticklabels(day_order) ax.set_ylabel('Proportion of Announcements') ax.set_title('Strategic Timing: Day-of-Week Announcement Patterns') ax.legend() plt.tight_layout() plt.show() ``` ### Announcement Congestion When many firms announce on the same day, each announcement receives less attention. We measure **announcement congestion** as the number of other firms making earnings announcements on the same date: $$ \text{Congestion}_{i,t} = \sum_{j \neq i} \mathbf{1}\{\text{AnnDate}_{j} = \text{AnnDate}_{i}\} $$ {#eq-congestion} @hirshleifer2009driven predict that firms burying bad news will choose high-congestion days. We test this by regressing the congestion variable on the sign of earnings news: ```{python} #| label: congestion-analysis #| eval: false #| code-summary: "Test whether bad-news firms choose high-congestion announcement days" # Count announcements per date ann_counts = ( annual_filings .groupby('announcement_date') .size() .reset_index(name='n_announcements') ) annual_filings = annual_filings.merge( ann_counts, on='announcement_date', how='left' ) annual_filings['congestion'] = annual_filings['n_announcements'] - 1 # Regression: congestion ~ bad_news + controls congestion_model = smf.ols( 'congestion ~ bad_news + log_size + roa + C(fiscal_year)', data=annual_filings.assign( log_size=np.log(annual_filings['total_assets']), roa=annual_filings['net_income'] / annual_filings['total_assets'] ) ).fit(cov_type='cluster', cov_kwds={'groups': annual_filings['ticker']}) print("Congestion Regression:") print(congestion_model.summary().tables[1]) ``` ### After-Hours and Weekend Announcements Vietnamese regulations require disclosure within 24 hours of material events, but firms retain discretion over the exact timing. Announcements made after the trading session closes (after 3:00 PM on HOSE/HNX) or on weekends delay the market's opportunity to react by at least one trading day. ```{python} #| label: after-hours #| code-summary: "Identify and analyze after-hours announcement patterns" #| eval: false # Assume announcement timestamps are available annual_filings['ann_hour'] = ( annual_filings['announcement_date'].dt.hour ) annual_filings['after_hours'] = ( (annual_filings['ann_hour'] >= 15) | (annual_filings['announcement_dow'] >= 5) # Saturday/Sunday ).astype(int) # Cross-tabulate after-hours by news type afterhours_crosstab = pd.crosstab( annual_filings['after_hours'].map({0: 'During Hours', 1: 'After Hours'}), annual_filings['bad_news'].map({0: 'Good News', 1: 'Bad News'}), normalize='index' ) print("News Distribution by Announcement Timing:") print(afterhours_crosstab.round(3)) # Chi-squared test contingency = pd.crosstab( annual_filings['after_hours'], annual_filings['bad_news'] ) chi2, p_val, _, _ = stats.chi2_contingency(contingency) print(f"\nChi-squared = {chi2:.2f}, p-value = {p_val:.4f}") ``` ## Market Consequences of Disclosure Quality {#sec-dis-qual-consequences} ### Disclosure Quality and the Cost of Equity The central prediction of @diamond1991disclosure and @botosan1997disclosure is that higher-quality disclosure lowers the cost of equity capital by reducing information asymmetry. We test this using the implied cost of capital (ICC) approach, where we estimate the discount rate that equates the current price to the present value of expected future earnings. We use the PEG ratio approach as a simple ICC estimate: $$ r_{PEG,i,t} = \sqrt{\frac{\hat{E}_{i,t+2} - \hat{E}_{i,t+1}}{P_{i,t}}} $$ {#eq-icc} where $\hat{E}_{i,t+k}$ is the consensus earnings forecast (or, in the absence of analyst coverage, a model-based forecast) and $P_{i,t}$ is the current stock price. ```{python} #| label: cost-of-equity #| code-summary: "Estimate implied cost of equity and test disclosure quality effect" #| eval: false # Construct earnings forecasts using a simple random walk with drift forecasts = financials.sort_values(['ticker', 'fiscal_year']).copy() forecasts['eps'] = forecasts['net_income'] / forecasts['market_cap'] forecasts['eps_growth'] = forecasts.groupby('ticker')['eps'].pct_change() # Simple forecast: E[t+1] = E[t] * (1 + avg_growth) forecasts['avg_growth'] = ( forecasts.groupby('ticker')['eps_growth'] .transform(lambda x: x.rolling(3, min_periods=2).mean()) ) forecasts['eps_f1'] = forecasts['eps'] * (1 + forecasts['avg_growth']) forecasts['eps_f2'] = forecasts['eps_f1'] * (1 + forecasts['avg_growth']) # PEG-based ICC forecasts['icc_peg'] = np.sqrt( np.maximum(forecasts['eps_f2'] - forecasts['eps_f1'], 0) / np.maximum(forecasts['market_cap'] / 1e6, 1e-6) ) # Merge with disclosure quality icc_panel = ( forecasts[['ticker', 'fiscal_year', 'icc_peg']] .merge(quality_panel[['ticker', 'fiscal_year', 'dq_index']].reset_index(drop=True), on=['ticker', 'fiscal_year'], how='inner') .merge(governance, on=['ticker', 'fiscal_year'], how='left') .merge(financials[['ticker', 'fiscal_year', 'total_assets', 'book_to_market', 'market_cap']], on=['ticker', 'fiscal_year'], how='left') ) icc_panel['log_size'] = np.log(icc_panel['market_cap']) icc_panel = icc_panel.set_index(['ticker', 'fiscal_year']) # Panel regression: ICC ~ DQ + controls icc_model = PanelOLS( dependent=icc_panel['icc_peg'], exog=sm.add_constant(icc_panel[[ 'dq_index', 'log_size', 'book_to_market' ]]), entity_effects=True, time_effects=True, check_rank=False ).fit(cov_type='clustered', cluster_entity=True) print("Implied Cost of Capital ~ Disclosure Quality:") print(icc_model.summary) ``` ### Disclosure Quality and Liquidity @diamond1991disclosure predict that better disclosure reduces adverse selection and improves liquidity. We measure liquidity through bid-ask spreads and the Amihud illiquidity ratio: $$ \text{Amihud}_{i,t} = \frac{1}{D_{i,t}} \sum_{d=1}^{D_{i,t}} \frac{|R_{i,d}|}{\text{Volume}_{i,d}} $$ {#eq-amihud} where $R_{i,d}$ is the daily return and $\text{Volume}_{i,d}$ is the daily trading volume in VND. ```{python} #| label: liquidity-analysis #| code-summary: "Compute Amihud illiquidity and test disclosure quality effect" #| eval: false # Compute annual Amihud illiquidity trading['abs_return'] = trading['close'].pct_change().abs() trading['amihud_daily'] = trading['abs_return'] / (trading['volume'] * trading['close']) amihud_annual = ( trading .assign(fiscal_year=trading['date'].dt.year) .groupby(['ticker', 'fiscal_year']) .agg( amihud=('amihud_daily', 'mean'), avg_spread=('bid_ask_spread', 'mean'), avg_turnover=('turnover', 'mean') ) .reset_index() ) # Log transform for better distributional properties amihud_annual['log_amihud'] = np.log(amihud_annual['amihud'] + 1e-10) amihud_annual['log_spread'] = np.log(amihud_annual['avg_spread'] + 1e-6) # Merge and run regression liq_panel = ( amihud_annual .merge(quality_panel[['ticker', 'fiscal_year', 'dq_index']].reset_index(drop=True), on=['ticker', 'fiscal_year'], how='inner') .merge(financials[['ticker', 'fiscal_year', 'market_cap', 'total_assets']], on=['ticker', 'fiscal_year'], how='left') ) liq_panel['log_size'] = np.log(liq_panel['market_cap']) liq_panel = liq_panel.set_index(['ticker', 'fiscal_year']) liq_model = PanelOLS( dependent=liq_panel['log_amihud'], exog=sm.add_constant(liq_panel[['dq_index', 'log_size']]), entity_effects=True, time_effects=True, check_rank=False ).fit(cov_type='clustered', cluster_entity=True) print("Amihud Illiquidity ~ Disclosure Quality:") print(liq_model.summary) ``` ```{python} #| label: fig-dq-liquidity #| eval: false #| fig-cap: "Disclosure quality quintiles and average Amihud illiquidity. Firms in the highest disclosure quality quintile exhibit substantially lower illiquidity, consistent with the prediction that better disclosure reduces information asymmetry and improves market liquidity." #| code-summary: "Plot disclosure quality vs liquidity relationship" liq_panel_plot = liq_panel.reset_index() liq_panel_plot['dq_quintile'] = pd.qcut( liq_panel_plot['dq_index'], 5, labels=['Q1\n(Low)', 'Q2', 'Q3', 'Q4', 'Q5\n(High)'] ) quintile_liq = ( liq_panel_plot .groupby('dq_quintile')['log_amihud'] .agg(['mean', 'sem']) ) fig, ax = plt.subplots(figsize=(8, 5)) bars = ax.bar( range(5), quintile_liq['mean'], yerr=1.96 * quintile_liq['sem'], color=['#C0392B', '#E67E22', '#F1C40F', '#27AE60', '#2C5F8A'], alpha=0.85, capsize=4, edgecolor='white' ) ax.set_xticks(range(5)) ax.set_xticklabels(quintile_liq.index) ax.set_xlabel('Disclosure Quality Quintile') ax.set_ylabel('Log Amihud Illiquidity') ax.set_title('Disclosure Quality and Market Liquidity') plt.tight_layout() plt.show() ``` ### Event Study: Market Reaction to Filing Lag We examine whether the market reacts differently to early vs. late filers by computing cumulative abnormal returns (CARs) around the filing date: $$ CAR_{i}[\tau_1, \tau_2] = \sum_{t=\tau_1}^{\tau_2} (R_{i,t} - \hat{R}_{i,t}) $$ {#eq-car} where $\hat{R}_{i,t}$ is the expected return from a market model estimated over a pre-event window $[-250, -30]$. ```{python} #| label: event-study #| eval: false #| code-summary: "Conduct event study around filing dates by timeliness group" def compute_car(ticker, event_date, trading_df, est_window=(-250, -30), event_window=(-5, 10)): """Compute CAR around an event date using market model.""" firm_data = trading_df[trading_df['ticker'] == ticker].copy() firm_data = firm_data.sort_values('date') # Find event date index event_idx = firm_data[firm_data['date'] >= event_date].index if len(event_idx) == 0: return None event_idx = event_idx[0] event_pos = firm_data.index.get_loc(event_idx) # Check sufficient data if event_pos + est_window[0] < 0: return None # Estimation window est_start = event_pos + est_window[0] est_end = event_pos + est_window[1] est_data = firm_data.iloc[est_start:est_end + 1] firm_ret = est_data['close'].pct_change() mkt_ret = est_data['market_return'] valid = firm_ret.notna() & mkt_ret.notna() if valid.sum() < 100: return None # Market model X = sm.add_constant(mkt_ret[valid]) model = sm.OLS(firm_ret[valid], X).fit() # Event window ev_start = event_pos + event_window[0] ev_end = event_pos + event_window[1] ev_data = firm_data.iloc[ev_start:ev_end + 1] ev_ret = ev_data['close'].pct_change() ev_mkt = ev_data['market_return'] expected_ret = model.params['const'] + model.params['market_return'] * ev_mkt abnormal_ret = ev_ret - expected_ret return abnormal_ret.cumsum().values # Sample: compute CARs for annual filings car_results = [] for _, row in annual_filings.sample(min(2000, len(annual_filings))).iterrows(): car = compute_car(row['ticker'], row['filing_date'], trading) if car is not None and len(car) == 16: # -5 to +10 car_results.append({ 'ticker': row['ticker'], 'fiscal_year': row['fiscal_year'], 'lag_tercile': row['lag_tercile'], 'car': car }) car_df = pd.DataFrame(car_results) print(f"Computed CARs for {len(car_df)} firm-year events") ``` ```{python} #| label: fig-event-study #| eval: false #| fig-cap: "Cumulative abnormal returns around the filing date, by reporting timeliness tercile. Early filers experience positive CARs, consistent with the good-news-early hypothesis of @givoly1982timeliness. Late filers exhibit negative drift, suggesting that delayed filing conveys unfavourable information." #| code-summary: "Plot CAR paths by timeliness tercile" event_days = range(-5, 11) fig, ax = plt.subplots(figsize=(10, 6)) colors = {'Early': '#27AE60', 'Middle': '#F1C40F', 'Late': '#C0392B'} for tercile in ['Early', 'Middle', 'Late']: subset = car_df[car_df['lag_tercile'] == tercile] if len(subset) > 0: avg_car = np.mean(np.stack(subset['car'].values), axis=0) se_car = np.std(np.stack(subset['car'].values), axis=0) / np.sqrt(len(subset)) ax.plot(event_days, avg_car, color=colors[tercile], linewidth=2, label=tercile) ax.fill_between(event_days, avg_car - 1.96 * se_car, avg_car + 1.96 * se_car, color=colors[tercile], alpha=0.15) ax.axvline(x=0, color='gray', linestyle='--', linewidth=0.8) ax.axhline(y=0, color='gray', linewidth=0.5) ax.set_xlabel('Event Day (Relative to Filing Date)') ax.set_ylabel('Cumulative Abnormal Return') ax.set_title('Market Reaction Around Filing Date by Timeliness') ax.legend(title='Filing Tercile') plt.tight_layout() plt.show() ``` ## Filing Timeliness and Earnings Quality {#sec-dis-qual-timeliness-quality} @givoly1982timeliness and @chambers1984timeliness establish that the content of disclosed information is correlated with its timing. We test this link formally: do late filers have worse earnings quality? ```{python} #| label: fig-timeliness-quality #| fig-cap: "Relationship between filing timeliness and earnings quality. Late-filing firms (right side) exhibit higher accruals volatility (lower earnings quality) and lower earnings persistence, consistent with the hypothesis that filing delays signal accounting difficulties." #| code-summary: "Analyze relationship between filing lag and earnings quality" #| eval: false tq_panel = ( annual_filings[['ticker', 'fiscal_year', 'lag_tercile', 'reporting_lag']] .merge(aq_df, on=['ticker', 'fiscal_year'], how='inner') .merge(persistence_df[['ticker', 'persistence']], on='ticker', how='left') ) fig, axes = plt.subplots(1, 2, figsize=(14, 5)) # Panel A: Accruals quality by tercile aq_by_tercile = tq_panel.groupby('lag_tercile')['accruals_quality'].mean() axes[0].bar( range(3), aq_by_tercile.values, color=['#27AE60', '#F1C40F', '#C0392B'], alpha=0.85, edgecolor='white' ) axes[0].set_xticks(range(3)) axes[0].set_xticklabels(['Early', 'Middle', 'Late']) axes[0].set_ylabel('Accruals Quality (σ of DD Residuals)') axes[0].set_title('Panel A: Accruals Quality by Filing Tercile') axes[0].text(0.05, 0.95, 'Higher = lower quality', transform=axes[0].transAxes, fontsize=9, verticalalignment='top', style='italic', color='gray') # Panel B: Persistence by tercile per_by_tercile = tq_panel.groupby('lag_tercile')['persistence'].mean() axes[1].bar( range(3), per_by_tercile.values, color=['#27AE60', '#F1C40F', '#C0392B'], alpha=0.85, edgecolor='white' ) axes[1].set_xticks(range(3)) axes[1].set_xticklabels(['Early', 'Middle', 'Late']) axes[1].set_ylabel('Earnings Persistence (AR(1) Coefficient)') axes[1].set_title('Panel B: Earnings Persistence by Filing Tercile') plt.tight_layout() plt.show() ``` We formalize this with a regression that controls for firm characteristics: ```{python} #| label: timeliness-quality-regression #| eval: false #| code-summary: "Regression: accruals quality on filing lag with controls" tq_panel_reg = tq_panel.merge( financials[['ticker', 'fiscal_year', 'total_assets', 'net_income', 'total_equity']], on=['ticker', 'fiscal_year'], how='left' ).merge(governance, on=['ticker', 'fiscal_year'], how='left') tq_panel_reg['log_size'] = np.log(tq_panel_reg['total_assets']) tq_panel_reg['roa'] = tq_panel_reg['net_income'] / tq_panel_reg['total_assets'] tq_panel_reg['late'] = (tq_panel_reg['lag_tercile'] == 'Late').astype(int) model_tq = smf.ols( 'accruals_quality ~ late + log_size + roa + state_ownership_pct ' '+ big4_auditor + C(fiscal_year)', data=tq_panel_reg ).fit(cov_type='cluster', cov_kwds={'groups': tq_panel_reg['ticker']}) print("Accruals Quality ~ Late Filing:") print(model_tq.summary().tables[1]) ``` ::: callout-note ## Endogeneity Caveat The association between filing timeliness and earnings quality is likely endogenous: firms with complex accounting issues take longer to prepare financial statements, and the same complexity drives lower earnings quality. The filing lag is thus best interpreted as an *observable signal* of underlying accounting difficulty rather than a causal determinant. Instrumental variable approaches (e.g., using auditor busyness during peak filing season as an instrument for filing lag) can partially address this concern. ::: ## Disclosure Quality and Investment Efficiency {#sec-dis-qual-investment} @biddle2009does demonstrate that higher financial reporting quality is associated with more efficient investment. Specifically, it reduces both over-investment (in firms with excess cash) and under-investment (in firms that are financially constrained). The mechanism is that better disclosure reduces information asymmetry between managers and capital providers, improving the allocation of capital. We test this prediction in Vietnam using the @biddle2009does framework: $$ \text{Investment}_{i,t+1} = \alpha + \beta_1 \text{SalesGrowth}_{i,t} + \varepsilon_{i,t+1} $$ {#eq-inv-efficiency} The residual $\hat{\varepsilon}_{i,t+1}$ measures deviation from expected investment. Positive residuals indicate over-investment; negative residuals indicate under-investment. We then test whether the absolute value of this residual is lower for firms with higher disclosure quality. ```{python} #| label: investment-efficiency #| eval: false #| code-summary: "Estimate investment efficiency and test disclosure quality link" inv_panel = financials.sort_values(['ticker', 'fiscal_year']).copy() # Investment = change in total assets / lagged total assets inv_panel['investment'] = ( inv_panel.groupby('ticker')['total_assets'].pct_change() ) inv_panel['sales_growth'] = ( inv_panel.groupby('ticker')['revenue'].pct_change() ) # Expected investment model inv_model = smf.ols( 'investment ~ sales_growth', data=inv_panel ).fit() inv_panel['inv_residual'] = inv_model.resid inv_panel['abs_inv_residual'] = inv_panel['inv_residual'].abs() # Merge with disclosure quality inv_eff = ( inv_panel[['ticker', 'fiscal_year', 'abs_inv_residual', 'investment', 'total_assets']] .merge(quality_panel[['ticker', 'fiscal_year', 'dq_index']].reset_index(drop=True), on=['ticker', 'fiscal_year'], how='inner') ) inv_eff['log_size'] = np.log(inv_eff['total_assets']) inv_eff = inv_eff.set_index(['ticker', 'fiscal_year']) # Panel regression inv_eff_model = PanelOLS( dependent=inv_eff['abs_inv_residual'], exog=sm.add_constant(inv_eff[['dq_index', 'log_size']]), entity_effects=True, time_effects=True, check_rank=False ).fit(cov_type='clustered', cluster_entity=True) print("Investment Inefficiency ~ Disclosure Quality:") print(inv_eff_model.summary) ``` A negative coefficient on `dq_index` indicates that higher disclosure quality is associated with lower investment inefficiency: firms with better disclosure make investment decisions closer to what their growth opportunities warrant. ## Vietnamese Institutional Context {#sec-dis-qual-vietnam-context} ### State Ownership and Disclosure SOEs account for a substantial share of Vietnamese market capitalization. The relationship between state ownership and disclosure quality is theoretically ambiguous. On one hand, political connections may reduce the pressure to disclose transparently; government shareholders may tolerate opacity that private shareholders would not. On the other hand, post-equitization monitoring by multiple stakeholders (MOF, SCIC, minority shareholders) may create competing disclosure demands. ```{python} #| label: soe-disclosure #| eval: false #| code-summary: "Compare disclosure quality between SOEs and private firms" soe_panel = ( quality_panel[['ticker', 'fiscal_year', 'dq_index', 'reporting_lag']].reset_index(drop=True) .merge(governance[['ticker', 'fiscal_year', 'state_ownership_pct']], on=['ticker', 'fiscal_year'], how='inner') ) soe_panel['soe'] = (soe_panel['state_ownership_pct'] >= 50).astype(int) soe_panel['soe_label'] = soe_panel['soe'].map( {1: 'SOE (≥50%)', 0: 'Private (<50%)'} ) # Compare means comparison = ( soe_panel .groupby('soe_label') .agg( n=('dq_index', 'count'), mean_dq=('dq_index', 'mean'), median_dq=('dq_index', 'median'), mean_lag=('reporting_lag', 'mean'), median_lag=('reporting_lag', 'median') ) .round(3) ) print("SOE vs Private Firm Disclosure Comparison:") print(comparison) # Formal t-test soe_dq = soe_panel[soe_panel['soe'] == 1]['dq_index'] priv_dq = soe_panel[soe_panel['soe'] == 0]['dq_index'] t_stat, p_val = stats.ttest_ind(soe_dq.dropna(), priv_dq.dropna()) print(f"\nt-test: t = {t_stat:.3f}, p = {p_val:.4f}") ``` ### IFRS Convergence and Disclosure Quality Vietnam has been pursuing a phased convergence toward IFRS, with the Ministry of Finance issuing a roadmap for voluntary adoption by large listed firms. The transition from VAS to IFRS-aligned standards is expected to expand disclosure requirements—particularly for financial instruments (IFRS 9), revenue recognition (IFRS 15), and leases (IFRS 16). @barth2008international provide evidence that IFRS adoption is associated with improvements in earnings quality and disclosure, though the effect depends on enforcement strength. We can exploit the staggered timing of voluntary IFRS adoption across Vietnamese firms as a natural experiment: ```{python} #| label: ifrs-adoption #| eval: false #| code-summary: "Difference-in-differences: IFRS adoption and disclosure quality" # Assume DataCore provides IFRS adoption dates ifrs_adoption = client.get_ifrs_adoption( exchanges=['HOSE', 'HNX'], fields=['ticker', 'ifrs_adoption_year'] ) # Merge with quality panel ifrs_panel = ( quality_panel[['ticker', 'fiscal_year', 'dq_index']].reset_index(drop=True) .merge(ifrs_adoption, on='ticker', how='left') ) # Treatment indicator ifrs_panel['post_ifrs'] = ( ifrs_panel['fiscal_year'] >= ifrs_panel['ifrs_adoption_year'] ).astype(int).fillna(0) ifrs_panel['treated'] = ifrs_panel['ifrs_adoption_year'].notna().astype(int) # Simple DiD ifrs_panel = ifrs_panel.set_index(['ticker', 'fiscal_year']) did_model = PanelOLS( dependent=ifrs_panel['dq_index'], exog=sm.add_constant(ifrs_panel[['post_ifrs']]), entity_effects=True, time_effects=True, check_rank=False ).fit(cov_type='clustered', cluster_entity=True) print("DiD: IFRS Adoption and Disclosure Quality:") print(did_model.summary) ``` ::: callout-note ## Identification Concern Voluntary IFRS adoption is endogenous because firms that choose to adopt early may already have higher-quality disclosure. The two-way fixed effects DiD absorbs time-invariant firm characteristics and common time trends, but cannot fully address selection on time-varying unobservables. Researchers should consider matching estimators (e.g., propensity score matching on pre-adoption characteristics) or instrumental variable approaches as robustness checks. ::: ## Predicting Late Filings {#sec-dis-qual-prediction} Can we predict which firms will file late? This is valuable for portfolio construction (avoiding potential bad-news firms) and for regulators (targeting enforcement resources). We use a logistic model with financial and governance predictors: $$ \Pr(\text{Late}_{i,t} = 1) = \Lambda\left(\alpha + \boldsymbol{\beta}'\mathbf{X}_{i,t-1}\right) $$ {#eq-prediction} where $\Lambda(\cdot)$ is the logistic function and $\mathbf{X}_{i,t-1}$ are lagged predictors. ```{python} #| label: predict-late-filing #| eval: false #| code-summary: "Logistic model to predict late filings" from sklearn.metrics import roc_auc_score, classification_report from sklearn.model_selection import cross_val_score from sklearn.linear_model import LogisticRegression pred_panel = ( annual_filings[['ticker', 'fiscal_year', 'late_filing']] .merge(financials, on=['ticker', 'fiscal_year'], how='left') .merge(governance, on=['ticker', 'fiscal_year'], how='left') ) # Lagged predictors pred_panel = pred_panel.sort_values(['ticker', 'fiscal_year']) for col in ['total_assets', 'net_income', 'operating_cash_flow', 'total_equity', 'revenue']: pred_panel[f'{col}_lag'] = pred_panel.groupby('ticker')[col].shift(1) pred_panel['log_size_lag'] = np.log(pred_panel['total_assets_lag']) pred_panel['roa_lag'] = ( pred_panel['net_income_lag'] / pred_panel['total_assets_lag'] ) pred_panel['leverage_lag'] = ( (pred_panel['total_assets_lag'] - pred_panel['total_equity_lag']) / pred_panel['total_assets_lag'] ) pred_panel['cfo_ratio_lag'] = ( pred_panel['operating_cash_flow_lag'] / pred_panel['total_assets_lag'] ) # Previous late filing indicator pred_panel['prev_late'] = ( pred_panel.groupby('ticker')['late_filing'].shift(1) ) features = [ 'log_size_lag', 'roa_lag', 'leverage_lag', 'cfo_ratio_lag', 'state_ownership_pct', 'foreign_ownership_pct', 'big4_auditor', 'board_independence_pct', 'prev_late' ] clean = pred_panel.dropna(subset=features + ['late_filing']) X = clean[features] y = clean['late_filing'] # Standardize scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # Logistic regression with cross-validation lr = LogisticRegression(max_iter=1000, penalty='l2', C=1.0) cv_scores = cross_val_score(lr, X_scaled, y, cv=5, scoring='roc_auc') print(f"5-Fold Cross-Validated AUC: {cv_scores.mean():.3f} ± {cv_scores.std():.3f}") # Fit on full sample for coefficient interpretation lr.fit(X_scaled, y) coef_df = pd.DataFrame({ 'Feature': features, 'Coefficient': lr.coef_[0], 'Odds Ratio': np.exp(lr.coef_[0]) }).sort_values('Coefficient', ascending=False) print("\nLogistic Regression Coefficients:") print(coef_df.to_string(index=False)) ``` ```{python} #| label: fig-roc-curve #| eval: false #| fig-cap: "ROC curve for the late-filing prediction model. The model achieves reasonable discriminative power, with previous late filing status and firm size as the strongest predictors." #| code-summary: "Plot ROC curve for the late-filing prediction model" from sklearn.metrics import roc_curve, auc lr.fit(X_scaled, y) y_prob = lr.predict_proba(X_scaled)[:, 1] fpr, tpr, _ = roc_curve(y, y_prob) roc_auc = auc(fpr, tpr) fig, ax = plt.subplots(figsize=(7, 7)) ax.plot(fpr, tpr, color='#2C5F8A', linewidth=2, label=f'Logistic Model (AUC = {roc_auc:.3f})') ax.plot([0, 1], [0, 1], color='gray', linestyle='--', linewidth=1) ax.set_xlabel('False Positive Rate') ax.set_ylabel('True Positive Rate') ax.set_title('Late Filing Prediction: ROC Curve') ax.legend(loc='lower right') ax.set_aspect('equal') plt.tight_layout() plt.show() ``` ## Summary {#sec-dis-qual-summary} This chapter has examined corporate disclosure quality and timing in Vietnam along several dimensions. The key findings and methodological contributions are in @tbl-dis-qual-summary | Theme | Key Result | Reference | |------------------------|------------------------|------------------------| | Good news early | Early filers earn positive CARs around filing dates | @givoly1982timeliness | | Textual quality | Forward-looking density and numerical specificity vary substantially | @li2008annual | | Composite DQ index | Foreign ownership and Big 4 auditors are strongest determinants | @botosan1997disclosure | | Cost of capital | Higher DQ is associated with lower implied cost of equity | @diamond1991disclosure | | Liquidity | Higher DQ firms have lower Amihud illiquidity | @lang2012transparency | | Investment efficiency | Higher DQ reduces absolute investment residuals | @biddle2009does | | Strategic timing | Evidence of bad-news clustering on high-congestion days | @hirshleifer2009driven | | IFRS adoption | Preliminary evidence of DQ improvement post-adoption | @barth2008international | : Summary of findings by theme. {#tbl-dis-qual-summary} The Vietnamese disclosure environment is shaped by a combination of regulatory mandates (Circular 155, Securities Law 2019), enforcement capacity (SSC penalties and trading suspensions), and firm-level incentives (ownership structure, auditor choice, governance quality). As Vietnam continues its IFRS convergence and capital market development, the information environment is expected to evolve, creating opportunities for researchers to study the dynamics of disclosure quality in a rapidly changing institutional setting.