Administrative Efficiency Report

This section provides an overview of the administrative barriers to the implementation of public procurement. Such barriers may arise from limited resources available to public buyers, insufficient staff capacity (e.g., staff shortages or low qualifications), an overload of procurement procedures, or inadequate regulation. The analysis focuses on identifying the most common symptoms of these administrative barriers, including the distribution of procurement procedure types, the occurrence of delays, the proportion of cancelled tenders, and other related indicators.

Is there an overuse of some procedure types?

Requires manual filling

Regulatory overview of procedure type thresholds and conditions

Procedure Type	Level of competition	Conditions / When used	Threshold (value / complexity)	Deadline requirements (Submission / Decision)
Open Procedure	Competitive	Default choice, anyone can bid.	Mandatory above X value, optional below.	Submission: ___ / Decision: ___
Restricted Procedure / approaching bidders	Competitive but limited	Used for complex or high-value tenders where only qualified firms are invited to submit full offers.	Usually higher value / complex tenders.	Submission: ___ / Decision: ___
Negotiated Procedure with Prior Publication	Competitive but limited	Complex projects which cannot have strictly defined requirements in advance; pre-selection of capable suppliers.	Usually higher value / complex tenders.	Submission: ___ / Decision: ___
Competitive Dialogue	Competitive but limited	The contract is particularly complex and the authority cannot define the technical means to meet its needs, or the legal or financial make-up of the project; pre-selection of capable suppliers.	Usually higher value / complex tenders.	Submission: ___ / Decision: ___
Concession	Competitive but limited	Remuneration for the supplier comes from revenues generated by operating the service or infrastructure; pre-selection of capable suppliers.	Usually complex infrastructural / service projects.	Submission: ___ / Decision: ___
Innovation Partnership	Competitive but limited	Joint R&D and purchase of innovative product; competitive entry and long-term partnership after; pre-selection of capable suppliers.	Complex / innovative projects.	Submission: ___ / Decision: ___
Design contest	Competitive	Open submission of project designs (architecture, urban planning, engineering, etc.).	Projects requiring design submission.	Submission: ___ / Decision: ___
Mini-tender	Competitive but limited	Second stage in framework agreements or DPS for pre-selected suppliers; invitation sent only to framework / DPS suppliers qualified for that lot/category.	Invitation sent only to suppliers qualified for that lot.	Submission: ___ / Decision: ___
Negotiated Procedure without Prior Publication	Non-competitive	Contracting authority directly approaches one or more suppliers under exceptional, legally defined conditions. Allowed only under strict circumstances (no competition, extreme urgency, or failed competitive tender).	Allowed only under strict legal circumstances.	Submission: ___ / Decision: ___
Direct Award / Single Source	Non-competitive	Direct purchase from supplier without competition. Used for very low-value purchases or specific exceptions.	Below X value or specific exceptions.	Submission: ___ / Decision: ___

Tendering techniques

Tendering technique	Level of competition	Conditions / When used	Threshold / relation to procedure	Deadlines / relation to procedure
Dynamic Purchasing System (DPS)	Competitive	Electronic system for recurring, standard items. Suppliers can apply to join at any time.	Same as for open / mini-tender.	Same as for open / mini-tender.
Framework Agreement	Competitive but limited	Long-term contract arrangement between one or more contracting authorities and one or more suppliers that sets terms (price, quantities, quality, etc.) for future purchases. Can result in a single supplier (no competition at second stage) or multiple suppliers (mini-tender at second stage).	Depending on the underlying procedure(s) used.	Depending on the underlying procedure(s).

Source: relevant regulations; MAPS, etc.

Country-specific results

Bulgaria (BG)

Is there an overuse of some procedure types?

Procedure type distribution – BG

This panel shows how tenders are distributed across different procedure types. The left plot displays the share of each procedure type, while the right shows absolute counts. A high reliance on non-competitive procedures (such as direct awards or negotiated without publication) may indicate limited competition or potential misuse of exceptional procurement rules.

Are submission periods too short?

Overall submission period distribution – BG

This histogram shows the distribution of submission periods (days from call for tender to bid deadline). Vertical lines mark the 25th percentile (Q1), median, and 75th percentile (Q3). Very short submission periods may prevent suppliers from preparing competitive bids, reducing competition and potentially favoring incumbents who are already familiar with the buyer’s requirements.

Submission periods by procedure type – BG

This faceted histogram breaks down submission periods by procedure type. Each panel shows quartiles for that specific procedure. More complex procedures (restricted, negotiated with publication) typically require longer preparation times, while open procedures should still allow sufficient time for suppliers to respond. Compare these distributions to regulatory minimums to identify systematic underuse of adequate submission windows.

Short vs. normal submission periods – BG

This histogram highlights submission periods flagged as unusually short (in red) compared to normal periods (in blue). The flagging is based on country-specific thresholds or, if unavailable, the median for each procedure type. A high proportion of short submission periods may indicate that buyers are not allowing adequate time for competitive bidding, which can reduce the number of bidders and lead to less competitive outcomes.

Which buyers set the shortest submission periods?

Short submission periods by buyer group – BG

These stacked bar charts show which buyer groups most frequently use short submission periods. The left panel displays counts, while the right panel shows contract values. If certain buyer types (e.g., national vs. regional) systematically use short deadlines, this may point to specific capacity constraints or procurement practices that merit closer examination. High-value contracts with short deadlines are particularly concerning as they affect larger market segments.

Are decision periods too long?

Overall decision period distribution – BG

This histogram shows the distribution of decision periods (days from award decision to contract signature). Long decision periods can delay project implementation and create uncertainty for suppliers. While some procedures naturally take longer due to complexity or legal requirements, consistently prolonged decisions may indicate administrative bottlenecks or lack of resources.

Decision periods by procedure type – BG

This faceted histogram shows decision periods broken down by procedure type, with quartile markers. This allows you to see whether more complex procedures take appropriately longer to evaluate, and whether simpler procedures are decided more quickly. Pay attention to procedure types where decisions are routinely much slower than others, or where complex tenders appear to be decided as quickly as simple ones.

Long vs. normal decision periods – BG

This histogram highlights decision periods flagged as unusually long (in red) compared to normal periods (in green). A high proportion of long decisions suggests systematic delays, which can undermine predictability for suppliers and stall public projects.

Which buyers have the longest decision periods?

Long decision periods by buyer group – BG

These stacked bar charts show which buyer groups most frequently experience long decision periods. The left panel displays contract counts, while the right panel shows contract values. High shares of delayed decisions in a specific buyer group may indicate bottlenecks in evaluation, internal approval, or oversight procedures that need to be addressed.

Is administrative efficiency linked to competition?

Effect of short submission periods on single bidding – BG

This figure summarizes how the likelihood of single bidding changes when submission periods are unusually short. The model-based prediction accounts for differences in buyer type, procedure type, and year. If the line for short deadlines sits noticeably higher, this suggests that tight timelines may be restricting competition and making single bidding more likely. The shaded area represents the 95% confidence interval around the prediction.

Figure not produced: no information or key variable is missing for the plot.

Sensitivity Analysis: Short Submission Period Model – BG

This section tests the robustness of the short submission period finding across different model specifications. The pipeline runs multiple versions of the logit model with varying combinations of: fixed effects (buyer FE, year FE, or both), clustering (by buyer, year, or buyer type), and control variables (procedure type, buyer type). Use this analysis to verify that the core finding about short submission periods and single-bidding holds across reasonable modeling choices and is not driven by a single specification.

What is sensitivity analysis?

Sensitivity analysis tests whether the relationship between the timing indicator and single-bidding holds across different modeling choices. A robust finding should be consistent across reasonable variations in:

Fixed effects (FE): Controls for unobserved differences across buyers, years, etc.
Clustering: Adjusts standard errors for correlation within groups
Control variables: Additional factors that might affect the outcome

Overall Summary:

n_specs	share_positive	share_negative	median_estimate	median_pvalue	median_strength	share_p_le_0.05	share_p_le_0.1	share_p_le_0.2	share_sign_stable	n_nonzero
40	0	1	-0.198	0.375	0	0.1	0.1	0.225	1	40

How to interpret:

n_specs (40): Number of alternative model specifications tested
n_nonzero (40): Number of specifications with non-zero estimates
share_positive (0.0%): Percentage of specifications where the estimate is positive
share_negative (100.0%): Percentage of specifications where the estimate is negative
median_estimate (-0.198): Typical size of the estimated effect across specifications
median_pvalue (0.375): Typical p-value; lower values indicate stronger statistical evidence
median_strength (0.000): Effect size when moving from 10th to 90th percentile of the timing indicator
share_p_le_0.05 (10.0%): Percentage of specs statistically significant at 5% level
share_p_le_0.1 (10.0%): Percentage of specs statistically significant at 10% level
share_p_le_0.2 (22.5%): Percentage of specs statistically significant at 20% level
share_sign_stable (Yes): Whether all non-zero estimates have the same sign (positive or negative)

Key takeaway: - Less than 40% of specifications are significant at p < 0.10, suggesting the finding may be sensitive to modeling choices. - The sign of the estimate is stable across all specifications, indicating consistent direction of the effect.

Breakdown by modeling choice:

Results by Fixed Effects

Results by Fixed Effects
fe	n_specs	share_p10	median_p
0	10	0.2	0.308
year	10	0.2	0.222
buyer	10	0.0	0.850
buyer+year	10	0.0	0.638

Results by Clustering Method

Results by Clustering Method
cluster	n_specs	share_p10	median_p
none	8	0.5	0.100
buyer	8	0.0	0.389
buyer_buyertype	8	0.0	0.445
year	8	0.0	0.501
buyer_year	8	0.0	0.512

Results by Control Variables

Results by Control Variables
controls	n_specs	share_positive	share_p10	median_p	median_strength
base	20	0	0.1	0.413	0
x_only	20	0	0.1	0.337	0

Classification of Specifications

Classification of Specifications
class	n	share
Negative & significant	4	0.1
Negative but insignificant	36	0.9

Top Performing Specification Combinations

Top Performing Specification Combinations
fe	cluster	controls	n	share_pok	median_p	median_est
year	none	base	1	1	0.000	-0.342
year	none	x_only	1	1	0.000	-0.288
0	none	base	1	1	0.001	-0.287
0	none	x_only	1	1	0.001	-0.254
year	buyer	base	1	0	0.110	-0.342
year	buyer	x_only	1	0	0.167	-0.288
0	buyer	base	1	0	0.193	-0.287
year	buyer_buyertype	base	1	0	0.196	-0.342
buyer+year	none	x_only	1	0	0.199	-0.142
0	buyer	x_only	1	0	0.247	-0.254

Effect of long decision periods on single bidding – BG

This figure assesses whether very long decision periods are associated with single bidding. The horizontal axis compares tenders decided within a normal time frame to those with delayed decisions. The vertical axis shows the predicted probability of single bidding, after accounting for buyer type and year. A higher predicted probability for delayed decisions may indicate strategic behavior or weakened competitive pressure in tenders that drag on.

Sensitivity Analysis: Long Decision Period Model – BG

This section tests the robustness of the long decision period finding across different model specifications. The pipeline runs multiple versions of the logit model with varying combinations of: fixed effects (buyer FE, year FE, or both), clustering (by buyer, year, or buyer type), and control variables (procedure type, buyer type). Use this analysis to assess whether the relationship between long decision periods and single bidding is consistent across different modeling assumptions.

What is sensitivity analysis?

Fixed effects (FE): Controls for unobserved differences across buyers, years, etc.
Clustering: Adjusts standard errors for correlation within groups
Control variables: Additional factors that might affect the outcome

Overall Summary:

n_specs	share_positive	share_negative	median_estimate	median_pvalue	median_strength	share_p_le_0.05	share_p_le_0.1	share_p_le_0.2	share_sign_stable	n_nonzero
40	1	0	0.341	0.009	0.074	0.85	1	1	1	40

How to interpret:

n_specs (40): Number of alternative model specifications tested
n_nonzero (40): Number of specifications with non-zero estimates
share_positive (100.0%): Percentage of specifications where the estimate is positive
share_negative (0.0%): Percentage of specifications where the estimate is negative
median_estimate (0.341): Typical size of the estimated effect across specifications
median_pvalue (0.009): Typical p-value; lower values indicate stronger statistical evidence
median_strength (0.074): Effect size when moving from 10th to 90th percentile of the timing indicator
share_p_le_0.05 (85.0%): Percentage of specs statistically significant at 5% level
share_p_le_0.1 (100.0%): Percentage of specs statistically significant at 10% level
share_p_le_0.2 (100.0%): Percentage of specs statistically significant at 20% level
share_sign_stable (Yes): Whether all non-zero estimates have the same sign (positive or negative)

Key takeaway: - More than 60% of specifications are significant at p < 0.10, suggesting the finding is robust across different modeling choices. - The sign of the estimate is stable across all specifications, indicating consistent direction of the effect.

Breakdown by modeling choice:

Results by Fixed Effects

Results by Fixed Effects
fe	n_specs	share_positive	share_p10	median_p	median_strength
0	10	1	1	0.033	0.064
buyer	10	1	1	0.003	0.148
buyer+year	10	1	1	0.009	0.080
year	10	1	1	0.010	0.041

Results by Clustering Method

Results by Clustering Method
cluster	n_specs	share_positive	share_p10	median_p	median_strength
none	8	1	1	0.000	0.074
buyer	8	1	1	0.004	0.074
buyer_buyertype	8	1	1	0.012	0.074
year	8	1	1	0.018	0.074
buyer_year	8	1	1	0.030	0.074

Results by Control Variables

Results by Control Variables
controls	n_specs	share_positive	share_p10	median_p	median_strength
base	20	1	1	0.012	0.069
x_only	20	1	1	0.009	0.075

Classification of Specifications

Classification of Specifications
class	n	share
Positive & significant	40	1

Top Performing Specification Combinations

Top Performing Specification Combinations
fe	cluster	controls	n	share_pok	median_p	median_est
buyer	none	x_only	1	1	0.000	0.631
buyer	none	base	1	1	0.000	0.625
0	none	x_only	1	1	0.000	0.385
0	none	base	1	1	0.000	0.337
year	none	x_only	1	1	0.000	0.329
year	none	base	1	1	0.000	0.264
buyer+year	none	x_only	1	1	0.000	0.345
buyer+year	none	base	1	1	0.000	0.335
buyer	buyer	x_only	1	1	0.001	0.631
buyer	buyer	base	1	1	0.001	0.625

Uruguay (UY)

Is there an overuse of some procedure types?

Procedure type distribution – UY

Are submission periods too short?

Overall submission period distribution – UY

Submission periods by procedure type – UY

Short vs. normal submission periods – UY

Which buyers set the shortest submission periods?

Short submission periods by buyer group – UY

Are decision periods too long?

Overall decision period distribution – UY

Decision periods by procedure type – UY

Long vs. normal decision periods – UY

Which buyers have the longest decision periods?

Long decision periods by buyer group – UY

Is administrative efficiency linked to competition?

Effect of short submission periods on single bidding – UY

Figure not produced: no information or key variable is missing for the plot.

Sensitivity Analysis: Short Submission Period Model – UY

What is sensitivity analysis?

Fixed effects (FE): Controls for unobserved differences across buyers, years, etc.
Clustering: Adjusts standard errors for correlation within groups
Control variables: Additional factors that might affect the outcome

Overall Summary:

n_specs	share_positive	share_negative	median_estimate	median_pvalue	median_strength	share_p_le_0.05	share_p_le_0.1	share_p_le_0.2	share_sign_stable	n_nonzero
40	0.625	0.375	0.011	0.517	0.003	0	0	0.05	0	40

How to interpret:

n_specs (40): Number of alternative model specifications tested
n_nonzero (40): Number of specifications with non-zero estimates
share_positive (62.5%): Percentage of specifications where the estimate is positive
share_negative (37.5%): Percentage of specifications where the estimate is negative
median_estimate (0.011): Typical size of the estimated effect across specifications
median_pvalue (0.517): Typical p-value; lower values indicate stronger statistical evidence
median_strength (0.003): Effect size when moving from 10th to 90th percentile of the timing indicator
share_p_le_0.05 (0.0%): Percentage of specs statistically significant at 5% level
share_p_le_0.1 (0.0%): Percentage of specs statistically significant at 10% level
share_p_le_0.2 (5.0%): Percentage of specs statistically significant at 20% level
share_sign_stable (No): Whether all non-zero estimates have the same sign (positive or negative)

Key takeaway: - Less than 40% of specifications are significant at p < 0.10, suggesting the finding may be sensitive to modeling choices. - The sign of the estimate varies across specifications, suggesting the direction of the effect is uncertain.

Breakdown by modeling choice:

Results by Fixed Effects

Results by Fixed Effects
fe	n_specs	share_positive	median_p	median_strength
0	10	1.0	0.517	0.011
buyer	10	0.5	0.925	-0.001
buyer+year	10	0.0	0.392	-0.013
year	10	1.0	0.547	0.009

Results by Clustering Method

Results by Clustering Method
cluster	n_specs	share_positive	median_p	median_strength
none	8	0.625	0.349	0.003
year	8	0.625	0.509	0.003
buyer_year	8	0.625	0.579	0.003
buyer_buyertype	8	0.625	0.637	0.003
buyer	8	0.625	0.648	0.003

Results by Control Variables

Results by Control Variables
controls	n_specs	share_positive	share_p10	median_p	median_strength
base	20	0.75	0	0.454	0.007
x_only	20	0.50	0	0.745	0.001

Classification of Specifications

Classification of Specifications
class	n	share
Negative but insignificant	15	0.375
Positive but insignificant	25	0.625

Top Performing Specification Combinations

Top Performing Specification Combinations
fe	cluster	controls	n	median_p	median_est
0	none	base	1	0.162	0.057
year	year	base	1	0.182	0.057
buyer+year	none	x_only	1	0.201	-0.072
buyer+year	none	base	1	0.249	-0.065
year	none	base	1	0.252	0.057
buyer+year	buyer	x_only	1	0.335	-0.072
buyer+year	year	x_only	1	0.347	-0.072
year	buyer_year	base	1	0.357	0.057
buyer+year	buyer	base	1	0.392	-0.065
buyer+year	year	base	1	0.393	-0.065

Effect of long decision periods on single bidding – UY

Sensitivity Analysis: Long Decision Period Model – UY

What is sensitivity analysis?

Fixed effects (FE): Controls for unobserved differences across buyers, years, etc.
Clustering: Adjusts standard errors for correlation within groups
Control variables: Additional factors that might affect the outcome

Overall Summary:

n_specs	share_positive	share_negative	median_estimate	median_pvalue	median_strength	share_p_le_0.05	share_p_le_0.1	share_p_le_0.2	share_sign_stable	n_nonzero
40	1	0	0.112	0.11	0.026	0.3	0.45	0.675	1	40

How to interpret:

n_specs (40): Number of alternative model specifications tested
n_nonzero (40): Number of specifications with non-zero estimates
share_positive (100.0%): Percentage of specifications where the estimate is positive
share_negative (0.0%): Percentage of specifications where the estimate is negative
median_estimate (0.112): Typical size of the estimated effect across specifications
median_pvalue (0.110): Typical p-value; lower values indicate stronger statistical evidence
median_strength (0.026): Effect size when moving from 10th to 90th percentile of the timing indicator
share_p_le_0.05 (30.0%): Percentage of specs statistically significant at 5% level
share_p_le_0.1 (45.0%): Percentage of specs statistically significant at 10% level
share_p_le_0.2 (67.5%): Percentage of specs statistically significant at 20% level
share_sign_stable (Yes): Whether all non-zero estimates have the same sign (positive or negative)

Key takeaway: - About 40-60% of specifications are significant at p < 0.10, suggesting moderate robustness. - The sign of the estimate is stable across all specifications, indicating consistent direction of the effect.

Breakdown by modeling choice:

Results by Fixed Effects

Results by Fixed Effects
fe	n_specs	share_positive	share_p10	median_p	median_strength
buyer+year	10	1	0.8	0.036	0.028
year	10	1	0.7	0.041	0.035
buyer	10	1	0.2	0.193	0.013
0	10	1	0.1	0.225	0.022

Results by Clustering Method

Results by Clustering Method
cluster	n_specs	share_positive	share_p10	median_p	median_strength
none	8	1	0.750	0.012	0.026
buyer_buyertype	8	1	0.625	0.052	0.026
buyer	8	1	0.375	0.148	0.026
year	8	1	0.250	0.149	0.026
buyer_year	8	1	0.250	0.192	0.026

Results by Control Variables

Results by Control Variables
controls	n_specs	share_positive	share_p10	median_p	median_strength
x_only	20	1	0.65	0.073	0.029
base	20	1	0.25	0.181	0.020

Classification of Specifications

Classification of Specifications
class	n	share
Positive & significant	18	0.45
Positive but insignificant	22	0.55

Top Performing Specification Combinations

Top Performing Specification Combinations
fe	cluster	controls	n	share_pok	median_p	median_est
year	none	x_only	1	1	0.000	0.180
0	none	x_only	1	1	0.001	0.116
buyer+year	none	x_only	1	1	0.002	0.153
year	buyer_buyertype	base	1	1	0.008	0.107
buyer+year	none	base	1	1	0.009	0.133
year	buyer_buyertype	x_only	1	1	0.011	0.180
buyer+year	buyer	x_only	1	1	0.013	0.153
year	none	base	1	1	0.015	0.107
buyer+year	buyer_buyertype	x_only	1	1	0.024	0.153
buyer+year	buyer	base	1	1	0.030	0.133

Show underlying code

# ========================================================================
# Administrative efficiency pipeline
# ========================================================================
# The pipeline produces:
#  - Descriptive figures on procedure mix, submission and decision periods
#  - Buyer-level breakdowns of short and long periods (by count and value)
#  - Regression-based estimates linking timing to single bidding
# ========================================================================

# ------------------------------------------------------------------------
# 0. Data loading helper
# ------------------------------------------------------------------------

load_data <- function(input_path) {
  data <- data.table::fread(
    input            = input_path,
    keepLeadingZeros = TRUE,
    encoding         = "UTF-8",
    stringsAsFactors = FALSE,
    showProgress     = TRUE,
    na.strings       = c("", "-", "NA")
  )
  
  # Drop duplicated column names (keep the first occurrence)
  dup_cols <- duplicated(names(data))
  if (any(dup_cols)) {
    data <- data[, !dup_cols, with = FALSE]
  }
  
  data
}

# ------------------------------------------------------------------------
# 1. Tender year helper
# ------------------------------------------------------------------------
# Extracts year information from the main tender dates and stores it in
# a numeric `tender_year` column.

add_tender_year <- function(df) {
  df %>%
    dplyr::mutate(
      tender_year = dplyr::coalesce(
        stringr::str_extract(tender_publications_firstcallfortenderdate, "^\\d{4}"),
        stringr::str_extract(tender_awarddecisiondate, "^\\d{4}"),
        stringr::str_extract(tender_biddeadline, "^\\d{4}")
      ),
      tender_year = as.integer(tender_year)
    )
}

# ------------------------------------------------------------------------
# 2. Procedure type recoding
# ------------------------------------------------------------------------
# Maps raw procedure codes to a small, interpretable set of labels.
# These labels are used consistently in descriptive plots.

recode_procedure_type <- function(x) {
  dplyr::recode(
    as.character(x),
    "OPEN"                           = "Open Procedure",
    "OUTRIGHT_AWARD"                 = "Direct Award",
    "RESTRICTED"                     = "Restricted Procedure",
    "OTHER"                          = "Other Procedures",
    "COMPETITIVE_DIALOG"             = "Competitive Dialog",
    "NEGOTIATED"                     = "Negotiated",
    "NEGOTIATED_WITHOUT_PUBLICATION" = "Negotiated without publications",
    "NEGOTIATED_WITH_PUBLICATION"    = "Negotiated with publications",
    .default                         = as.character(x)
  )
}

# ------------------------------------------------------------------------
# 3. Buyer grouping helper
# ------------------------------------------------------------------------
# Collapses granular buyer types into three broad groups used in the
# administrative efficiency figures.
add_buyer_group <- function(buyer_buyertype) {
  group <- dplyr::case_when(
    grepl("(?i)national",  buyer_buyertype) ~ "National Buyer",
    grepl("(?i)regional",  buyer_buyertype) ~ "Regional Buyer",
    grepl("(?i)utilities", buyer_buyertype) ~ "Utilities",
    grepl("(?i)European",  buyer_buyertype) ~ "EU agency",
    TRUE                                    ~ "Other Public Bodies"
  )
  
  factor(
    group,
    levels = c(
      "National Buyer",
      "Regional Buyer",
      "Utilities",
      "EU agency",
      "Other Public Bodies"
    )
  )
}

# ========================================================================
# SPECIFICATION TESTING AND SENSITIVITY ANALYSIS FUNCTIONS
# ========================================================================

#' Build fixed effects formula part
make_fe_part <- function(fe) {
  switch(
    fe,
    "0" = "0",
    "buyer" = "buyer_id",
    "year" = "tender_year",
    "buyer+year" = "buyer_id + tender_year",
    "buyer#year" = "buyer_id^tender_year",
    stop("Unknown FE spec: ", fe)
  )
}

#' Build cluster formula
make_cluster <- function(cluster) {
  switch(
    cluster,
    "none" = NULL,
    "buyer" = stats::as.formula("~ buyer_id"),
    "year" = stats::as.formula("~ tender_year"),
    "buyer_year" = stats::as.formula("~ buyer_id + tender_year"),
    "buyer_buyertype" = stats::as.formula("~ buyer_id + buyer_buyertype"),
    stop("Unknown cluster spec: ", cluster)
  )
}

#' Safe fixest model fitting
safe_fixest <- function(expr) {
  tryCatch(expr, error = function(e) NULL)
}

#' Extract effect from fixest model
extract_effect_fixest <- function(model, x_name, data_used, y_name = NULL) {
  s <- summary(model)
  ct <- s$coeftable
  
  if (!(x_name %in% rownames(ct))) {
    return(list(
      estimate = NA_real_,
      pvalue = NA_real_,
      nobs = s$nobs,
      std_slope = NA_real_
    ))
  }
  
  est <- as.numeric(ct[x_name, "Estimate"])
  pv <- as.numeric(ct[x_name, "Pr(>|t|)"])
  if (is.na(pv) && "Pr(>|z|)" %in% colnames(ct)) {
    pv <- as.numeric(ct[x_name, "Pr(>|z|)"])
  }
  
  sx <- stats::sd(data_used[[x_name]], na.rm = TRUE)
  std_slope <- est * sx
  
  list(
    estimate = est,
    pvalue = pv,
    nobs = s$nobs,
    std_slope = std_slope
  )
}

#' Compute effect at P10 vs P90
effect_p10_p90 <- function(model, data_used, x_name) {
  qs <- stats::quantile(data_used[[x_name]], probs = c(.1, .9), na.rm = TRUE)
  x_lo <- unname(qs[1])
  x_hi <- unname(qs[2])
  
  typical <- data_used[1, , drop = FALSE]
  for (nm in names(typical)) {
    if (nm == x_name) next
    v <- data_used[[nm]]
    if (is.numeric(v)) {
      typical[[nm]] <- stats::median(v, na.rm = TRUE)
    } else if (is.factor(v) || is.character(v)) {
      tab <- sort(table(v), decreasing = TRUE)
      typical[[nm]] <- names(tab)[1]
      if (is.factor(v)) typical[[nm]] <- factor(typical[[nm]], levels = levels(v))
    }
  }
  
  d_lo <- typical
  d_hi <- typical
  d_lo[[x_name]] <- x_lo
  d_hi[[x_name]] <- x_hi
  
  p_lo <- suppressWarnings(stats::predict(model, newdata = d_lo, type = "response"))
  p_hi <- suppressWarnings(stats::predict(model, newdata = d_hi, type = "response"))
  
  as.numeric(p_hi - p_lo)
}

# Sensitivity analysis functions
add_strength_column <- function(specs) {
  if (is.null(specs) || nrow(specs) == 0L) return(specs)
  
  if ("effect_strength" %in% names(specs)) {
    specs$strength <- specs$effect_strength
  } else if ("std_slope" %in% names(specs)) {
    specs$strength <- specs$std_slope
  } else {
    specs$strength <- NA_real_
  }
  specs
}

summarise_sensitivity_overall <- function(specs, p_levels = c(0.05, 0.10, 0.20)) {
  if (is.null(specs) || nrow(specs) == 0L) return(tibble::tibble())
  specs <- add_strength_column(specs)
  
  tibble::tibble(
    n_specs = nrow(specs),
    share_positive = mean(specs$estimate > 0, na.rm = TRUE),
    share_negative = mean(specs$estimate < 0, na.rm = TRUE),
    median_estimate = median(specs$estimate, na.rm = TRUE),
    median_pvalue = median(specs$pvalue, na.rm = TRUE),
    median_strength = median(specs$strength, na.rm = TRUE),
    !!!setNames(
      lapply(p_levels, function(p) mean(specs$pvalue <= p, na.rm = TRUE)),
      paste0("share_p_le_", p_levels)
    )
  )
}

summarise_sign_instability <- function(specs) {
  if (is.null(specs) || nrow(specs) == 0L) return(tibble::tibble())
  s <- sign(specs$estimate)
  s <- s[!is.na(s) & s != 0]
  tibble::tibble(
    share_sign_stable = if (length(s) == 0) NA_real_ else as.numeric(length(unique(s)) <= 1),
    n_nonzero = length(s)
  )
}

summarise_by_fe <- function(specs) {
  if (is.null(specs) || nrow(specs) == 0L) return(tibble::tibble())
  specs <- add_strength_column(specs)
  
  specs %>%
    dplyr::group_by(fe) %>%
    dplyr::summarise(
      n_specs = dplyr::n(),
      share_positive = mean(estimate > 0, na.rm = TRUE),
      share_p10 = mean(pvalue <= 0.10, na.rm = TRUE),
      median_p = median(pvalue, na.rm = TRUE),
      median_strength = median(strength, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    dplyr::arrange(dplyr::desc(share_p10))
}

summarise_by_cluster <- function(specs) {
  if (is.null(specs) || nrow(specs) == 0L) return(tibble::tibble())
  specs <- add_strength_column(specs)
  
  specs %>%
    dplyr::group_by(cluster) %>%
    dplyr::summarise(
      n_specs = dplyr::n(),
      share_positive = mean(estimate > 0, na.rm = TRUE),
      share_p10 = mean(pvalue <= 0.10, na.rm = TRUE),
      median_p = median(pvalue, na.rm = TRUE),
      median_strength = median(strength, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    dplyr::arrange(median_p)
}

summarise_by_controls <- function(specs) {
  if (is.null(specs) || nrow(specs) == 0L) return(tibble::tibble())
  specs <- add_strength_column(specs)
  
  specs %>%
    dplyr::group_by(controls) %>%
    dplyr::summarise(
      n_specs = dplyr::n(),
      share_positive = mean(estimate > 0, na.rm = TRUE),
      share_p10 = mean(pvalue <= 0.10, na.rm = TRUE),
      median_p = median(pvalue, na.rm = TRUE),
      median_strength = median(strength, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    dplyr::arrange(dplyr::desc(share_p10))
}

classify_specs <- function(specs, p_cut = 0.10) {
  if (is.null(specs) || nrow(specs) == 0L) return(tibble::tibble())
  specs %>%
    dplyr::mutate(
      class = dplyr::case_when(
        estimate > 0 & pvalue <= p_cut ~ "Positive & significant",
        estimate > 0 ~ "Positive but insignificant",
        estimate < 0 & pvalue <= p_cut ~ "Negative & significant",
        estimate < 0 ~ "Negative but insignificant",
        TRUE ~ "Missing/NA"
      )
    ) %>%
    dplyr::count(class) %>%
    dplyr::mutate(share = n / sum(n))
}

top_cells <- function(specs, p_cut = 0.10, n_top = 10) {
  if (is.null(specs) || nrow(specs) == 0L) return(tibble::tibble())
  specs %>%
    dplyr::mutate(p_ok = pvalue <= p_cut) %>%
    dplyr::group_by(fe, cluster, controls) %>%
    dplyr::summarise(
      n = dplyr::n(),
      share_pok = mean(p_ok, na.rm = TRUE),
      median_p = median(pvalue, na.rm = TRUE),
      median_est = median(estimate, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    dplyr::arrange(dplyr::desc(share_pok), median_p) %>%
    dplyr::slice_head(n = n_top)
}

build_sensitivity_bundle <- function(specs) {
  if (is.null(specs) || nrow(specs) == 0L) return(list())
  specs <- add_strength_column(specs)
  
  list(
    overall = summarise_sensitivity_overall(specs),
    sign = summarise_sign_instability(specs),
    by_fe = summarise_by_fe(specs),
    by_cluster = summarise_by_cluster(specs),
    by_controls = summarise_by_controls(specs),
    classes = classify_specs(specs),
    top_cells = top_cells(specs)
  )
}

pick_best_model <- function(results_df,
                            require_positive = TRUE,
                            p_max = 0.10,
                            strength_col = c("effect_strength", "std_slope")) {
  strength_col <- match.arg(strength_col)
  
  df <- results_df
  if (require_positive) df <- df[df$estimate > 0, , drop = FALSE]
  df <- df[!is.na(df$pvalue) & df$pvalue <= p_max, , drop = FALSE]
  df <- df[!is.na(df[[strength_col]]), , drop = FALSE]
  if (nrow(df) == 0) return(NULL)
  
  df <- df[order(df[[strength_col]], decreasing = TRUE), , drop = FALSE]
  df[["rank"]] <- seq_len(nrow(df))
  df[1, , drop = FALSE]
}

run_short_subm_specs <- function(reg_data, 
                                 fe_set = c("0", "buyer", "year", "buyer+year"),
                                 cluster_set = c("none", "buyer", "year", "buyer_year", "buyer_buyertype"),
                                 controls_set = c("x_only", "base")) {
  out <- list()
  k <- 0L
  
  for (fe in fe_set) {
    fe_part <- make_fe_part(fe)
    
    for (cl in cluster_set) {
      cl_fml <- make_cluster(cl)
      
      for (ctrl in controls_set) {
        rhs_terms <- switch(
          ctrl,
          "x_only" = c("short_submission_period"),
          "base" = c("short_submission_period", "buyer_buyertype", "tender_proceduretype")
        )
        
        rhs_terms <- rhs_terms[rhs_terms %in% names(reg_data)]
        rhs <- paste(rhs_terms, collapse = " + ")
        fml <- stats::as.formula(paste0("ind_corr_binary ~ ", rhs, " | ", fe_part))
        
        m <- safe_fixest(
          fixest::feglm(
            fml,
            family = quasibinomial(link = "logit"),
            data = reg_data,
            cluster = cl_fml
          )
        )
        
        if (is.null(m)) next
        
        eff <- extract_effect_fixest(
          model = m,
          x_name = "short_submission_period",
          data_used = reg_data
        )
        
        eff_strength <- safe_fixest(effect_p10_p90(m, reg_data, "short_submission_period"))
        if (is.null(eff_strength)) eff_strength <- NA_real_
        
        k <- k + 1L
        out[[k]] <- data.frame(
          outcome = "short_subm",
          model_type = "fractional_logit",
          fe = fe,
          cluster = cl,
          controls = ctrl,
          estimate = eff$estimate,
          pvalue = eff$pvalue,
          nobs = eff$nobs,
          std_slope = eff$std_slope,
          effect_strength = eff_strength,
          stringsAsFactors = FALSE
        )
      }
    }
  }
  
  if (length(out) == 0) return(data.frame())
  do.call(rbind, out)
}

run_long_dec_specs <- function(reg_data,
                               fe_set = c("0", "buyer", "year", "buyer+year"),
                               cluster_set = c("none", "buyer", "year", "buyer_year", "buyer_buyertype"),
                               controls_set = c("x_only", "base")) {
  out <- list()
  k <- 0L
  
  for (fe in fe_set) {
    fe_part <- make_fe_part(fe)
    
    for (cl in cluster_set) {
      cl_fml <- make_cluster(cl)
      
      for (ctrl in controls_set) {
        rhs_terms <- switch(
          ctrl,
          "x_only" = c("long_decision_period"),
          "base" = c("long_decision_period", "buyer_buyertype", "tender_proceduretype")
        )
        
        rhs_terms <- rhs_terms[rhs_terms %in% names(reg_data)]
        rhs <- paste(rhs_terms, collapse = " + ")
        fml <- stats::as.formula(paste0("ind_corr_binary ~ ", rhs, " | ", fe_part))
        
        m <- safe_fixest(
          fixest::feglm(
            fml,
            family = quasibinomial(link = "logit"),
            data = reg_data,
            cluster = cl_fml
          )
        )
        
        if (is.null(m)) next
        
        eff <- extract_effect_fixest(
          model = m,
          x_name = "long_decision_period",
          data_used = reg_data
        )
        
        eff_strength <- safe_fixest(effect_p10_p90(m, reg_data, "long_decision_period"))
        if (is.null(eff_strength)) eff_strength <- NA_real_
        
        k <- k + 1L
        out[[k]] <- data.frame(
          outcome = "long_dec",
          model_type = "fractional_logit",
          fe = fe,
          cluster = cl,
          controls = ctrl,
          estimate = eff$estimate,
          pvalue = eff$pvalue,
          nobs = eff$nobs,
          std_slope = eff$std_slope,
          effect_strength = eff_strength,
          stringsAsFactors = FALSE
        )
      }
    }
  }
  
  if (length(out) == 0) return(data.frame())
  do.call(rbind, out)
}

# ------------------------------------------------------------------------
# 4. Generic "days between" helper
# ------------------------------------------------------------------------
# Computes the number of days between two date columns and filters
# out negative or implausibly long intervals (> 364 days).

compute_tender_days <- function(df, from_col, to_col, new_col) {
  from_quo   <- rlang::enquo(from_col)
  to_quo     <- rlang::enquo(to_col)
  new_col_nm <- rlang::as_name(rlang::enquo(new_col))
  
  df %>%
    dplyr::mutate(
      !!from_quo := as.Date(!!from_quo),
      !!to_quo   := as.Date(!!to_quo)
    ) %>%
    dplyr::filter(
      !is.na(!!from_quo),
      !is.na(!!to_quo)
    ) %>%
    dplyr::mutate(
      !!new_col_nm := as.numeric(!!to_quo - !!from_quo)
    ) %>%
    dplyr::filter(
      !!rlang::sym(new_col_nm) >= 0,
      !!rlang::sym(new_col_nm) < 365
    )
}

# ------------------------------------------------------------------------
# 5. Short / medium submission flags (country-specific thresholds,
#    with median fallback when thresholds are NA)
# ------------------------------------------------------------------------

add_short_deadline_flags <- function(df,
                                     days_col = tender_days_open,
                                     proc_col = tender_proceduretype,
                                     thr) {
  days_col <- rlang::enquo(days_col)
  proc_col <- rlang::enquo(proc_col)
  
  # Medians by procedure type, used as fallback if thresholds are missing
  med_open <- df %>%
    dplyr::filter(!!proc_col == "Open Procedure") %>%
    dplyr::summarise(m = stats::median(!!days_col, na.rm = TRUE)) %>%
    dplyr::pull(m)
  
  med_restricted <- df %>%
    dplyr::filter(!!proc_col == "Restricted Procedure") %>%
    dplyr::summarise(m = stats::median(!!days_col, na.rm = TRUE)) %>%
    dplyr::pull(m)
  
  med_negotiated <- df %>%
    dplyr::filter(!!proc_col == "Negotiated with publications") %>%
    dplyr::summarise(m = stats::median(!!days_col, na.rm = TRUE)) %>%
    dplyr::pull(m)
  
  # If the country thresholds are NA, use the medians instead
  short_open_cutoff <- if (is.na(thr$subm_short_open))         med_open       else thr$subm_short_open
  short_rest_cutoff <- if (is.na(thr$subm_short_restricted))   med_restricted else thr$subm_short_restricted
  short_neg_cutoff  <- if (is.null(thr$subm_short_negotiated) ||
                           is.na(thr$subm_short_negotiated))  med_negotiated else thr$subm_short_negotiated
  
  # For "medium" we only flag if both bounds are provided (still only for Open)
  medium_min <- thr$subm_medium_open_min
  medium_max <- thr$subm_medium_open_max
  use_medium <- !is.na(medium_min) & !is.na(medium_max)
  
  df %>%
    dplyr::mutate(
      short_deadline = dplyr::case_when(
        !!proc_col == "Open Procedure" &
          !!days_col < short_open_cutoff ~ TRUE,
        !!proc_col == "Restricted Procedure" &
          !!days_col < short_rest_cutoff ~ TRUE,
        !!proc_col == "Negotiated with publications" &
          !!days_col < short_neg_cutoff ~ TRUE,
        TRUE ~ FALSE
      ),
      medium_deadline = dplyr::case_when(
        use_medium &
          !!proc_col == "Open Procedure" &
          !!days_col >= medium_min &
          !!days_col <  medium_max ~ TRUE,
        TRUE ~ FALSE
      )
    )
}



# ------------------------------------------------------------------------
# 6. Long decision flag (country-specific thresholds)
# ------------------------------------------------------------------------
# Flags “long” decisions for selected procedures using a single threshold
# in days, typically representing a long administrative delay.

add_long_decision_flag <- function(df,
                                   days_col = tender_days_dec,
                                   proc_col = tender_proceduretype,
                                   thr) {
  days_col <- rlang::enquo(days_col)
  proc_col <- rlang::enquo(proc_col)
  
  df %>%
    dplyr::mutate(
      long_decision = dplyr::case_when(
        !!proc_col %in% c(
          "Open Procedure",
          "Restricted Procedure",
          "Negotiated with publications"
        ) & !!days_col >= thr$long_decision_days ~ TRUE,
        TRUE ~ FALSE
      )
    )
}

# ------------------------------------------------------------------------
# 7. Generic histogram with quartiles
# ------------------------------------------------------------------------
# Plots a histogram of time intervals (in days) and overlays quartiles.
# Can be used overall or faceted by a categorical variable (e.g. procedure).

plot_days_hist_with_quartiles <- function(data,
                                          days_var,
                                          facet_var = NULL,
                                          title,
                                          x_lab,
                                          y_lab   = "Number of tenders",
                                          caption = NULL,
                                          binwidth = 5,
                                          xlim     = c(0, 365)) {
  days_sym <- rlang::sym(days_var)
  
  base <- ggplot2::ggplot(data, ggplot2::aes(x = !!days_sym)) +
    ggplot2::geom_histogram(
      binwidth = binwidth,
      fill     = "lightblue",
      color    = "white",
      boundary = 0
    ) +
    ggplot2::scale_y_continuous(
      expand = ggplot2::expansion(mult = c(0, 0.25))
    ) +
    ggplot2::coord_cartesian(xlim = xlim, clip = "off") +
    ggplot2::labs(
      title   = title,
      x       = x_lab,
      y       = y_lab,
      caption = caption
    ) +
    ggplot2::theme_minimal(base_size = 14) +
    ggplot2::theme(
      plot.title.position = "plot",
      plot.title          = ggplot2::element_text(margin = ggplot2::margin(b = 20)),
      plot.caption        = ggplot2::element_text(
        hjust  = 0,
        face   = "italic",
        size   = 10,
        margin = ggplot2::margin(t = 10)
      )
    )
  
  if (is.null(facet_var)) {
    # overall quartiles
    q <- stats::quantile(data[[days_var]], probs = c(0.25, 0.5, 0.75), na.rm = TRUE)
    
    base +
      ggplot2::geom_vline(
        xintercept = q,
        color      = "blue",
        linetype   = c("dashed", "solid", "dashed"),
        size       = 1
      ) +
      ggplot2::annotate(
        "text",
        x     = q,
        y     = Inf,
        label = paste0(names(q), ": ", round(q, 1), " days"),
        color = "blue",
        size  = 4,
        angle = 45,
        vjust = -1,
        hjust = 0
      )
    
  } else {
    facet_sym <- rlang::sym(facet_var)
    
    q_by_facet <- data %>%
      dplyr::group_by(!!facet_sym) %>%
      dplyr::summarise(
        q25 = stats::quantile(!!days_sym, 0.25, na.rm = TRUE),
        q50 = stats::quantile(!!days_sym, 0.50, na.rm = TRUE),
        q75 = stats::quantile(!!days_sym, 0.75, na.rm = TRUE),
        .groups = "drop"
      ) %>%
      tidyr::pivot_longer(
        cols      = dplyr::starts_with("q"),
        names_to  = "quartile",
        values_to = "xint"
      ) %>%
      dplyr::mutate(
        quartile_label = dplyr::case_when(
          quartile == "q25" ~ "25%",
          quartile == "q50" ~ "50% (median)",
          quartile == "q75" ~ "75%",
          TRUE              ~ quartile
        ),
        linetype = dplyr::case_when(
          quartile == "q50" ~ "solid",
          TRUE              ~ "dashed"
        )
      )
    
    base +
      ggplot2::geom_vline(
        data      = q_by_facet,
        ggplot2::aes(
          xintercept = xint,
          linetype   = quartile_label
        ),
        color = "blue",
        size  = 0.9
      ) +
      ggrepel::geom_text_repel(
        data = q_by_facet,
        ggplot2::aes(
          x     = xint,
          y     = Inf,
          label = paste0(quartile_label, ": ", round(xint, 1), " days")
        ),
        inherit.aes        = FALSE,
        color              = "blue",
        size               = 3.3,
        angle              = 90,
        vjust              = 1.2,
        min.segment.length = 0,
        segment.color      = "blue",
        box.padding        = 0.5,
        direction          = "x",
        max.overlaps       = Inf
      ) +
      ggplot2::facet_wrap(stats::as.formula(paste("~", facet_var)), scales = "free_y") +
      ggplot2::scale_linetype_manual(
        name   = NULL,
        values = c(
          "25%"          = "dashed",
          "50% (median)" = "solid",
          "75%"          = "dashed"
        )
      )
  }
}

# ------------------------------------------------------------------------
# 8. Procedure-type shares (value + count)
# ------------------------------------------------------------------------
# Aggregates contracts by procedure type and computes their share in total
# value and total number of contracts.

build_proc_share_data <- function(df) {
  df %>%
    dplyr::mutate(
      tender_proceduretype = recode_procedure_type(tender_proceduretype),
      tender_proceduretype = forcats::fct_explicit_na(
        as.factor(tender_proceduretype),
        na_level = "Missing value"
      )
    ) %>%
    dplyr::group_by(tender_proceduretype) %>%
    dplyr::summarise(
      total_value = sum(bid_priceusd, na.rm = TRUE),
      n_contracts = dplyr::n(),
      .groups     = "drop"
    ) %>%
    dplyr::mutate(
      share_value     = total_value / sum(total_value),
      share_contracts = n_contracts / sum(n_contracts)
    )
}

plot_proc_share_value <- function(plot_data) {
  ggplot2::ggplot(
    plot_data,
    ggplot2::aes(
      x = stats::reorder(tender_proceduretype, share_value),
      y = share_value
    )
  ) +
    ggplot2::geom_col(
      ggplot2::aes(fill = tender_proceduretype),
      show.legend = FALSE,
      width       = 0.6
    ) +
    ggplot2::geom_text(
      ggplot2::aes(
        label = paste0(
          scales::percent(share_value, accuracy = 0.1),
          " (",
          scales::dollar(total_value, scale = 1e-6, suffix = "M"),
          ")"
        )
      ),
      hjust = -0.05,
      size  = 4
    ) +
    ggplot2::scale_y_continuous(
      labels = scales::percent_format(accuracy = 1),
      expand = ggplot2::expansion(mult = c(0, 0.4))
    ) +
    ggplot2::scale_fill_brewer(palette = "Blues", direction = -1) +
    ggplot2::coord_flip() +
    ggplot2::labs(
      title   = "Share of contracts value",
      x       = NULL,
      y       = "Share of total value",
      caption = "Values in millions of USD"
    ) +
    ggplot2::theme_minimal(base_size = 14) +
    ggplot2::theme(
      plot.margin = ggplot2::margin(10, 30, 10, 10),
      axis.text.y = ggplot2::element_text(size = 14)
    )
}

plot_proc_share_count <- function(plot_data) {
  ggplot2::ggplot(
    plot_data,
    ggplot2::aes(
      x = stats::reorder(tender_proceduretype, share_value),  # keep same order
      y = share_contracts
    )
  ) +
    ggplot2::geom_col(
      ggplot2::aes(fill = tender_proceduretype),
      show.legend = FALSE,
      width       = 0.6
    ) +
    ggplot2::geom_text(
      ggplot2::aes(
        label = paste0(
          scales::percent(share_contracts, accuracy = 0.1),
          " (", n_contracts, " contracts)"
        )
      ),
      hjust = -0.05,
      size  = 4
    ) +
    ggplot2::scale_y_continuous(
      labels = scales::percent_format(accuracy = 1),
      expand = ggplot2::expansion(mult = c(0, 0.4))
    ) +
    ggplot2::scale_fill_brewer(palette = "Blues", direction = -1) +
    ggplot2::coord_flip() +
    ggplot2::labs(
      title = "Share of number of contracts",
      x     = NULL,
      y     = "Share of contracts"
    ) +
    ggplot2::theme_minimal(base_size = 14) +
    ggplot2::theme(
      plot.margin = ggplot2::margin(10, 30, 10, 0),
      axis.text.y = ggplot2::element_text(size = 14)
    )
}

# ------------------------------------------------------------------------
# 9. Threshold configuration and accessor
# ------------------------------------------------------------------------
# Stores and retrieves country-specific thresholds for short/medium
# submission periods and long decisions.

admin_threshold_config <- tibble::tribble(
  ~country_code, ~subm_short_open, ~subm_short_restricted, ~subm_short_negotiated,
  ~subm_medium_open_min, ~subm_medium_open_max, ~long_decision_days,
  
  "DEFAULT", 30, 30, 30, 30, 30, 60,
  "UY",      21, 14, 14, 21, 28, 56, #Specify here country deadlines. Consider that in law there are business days, and we calculalte actual days, so business days should be transformed to actual days first.
  "BG",      30, 30, 30, 30, 30, NA,
  "ID",      3, 3, NA, 3, 5, NA
)


get_admin_thresholds <- function(country_code) {
  cc <- toupper(country_code)
  
  row <- admin_threshold_config %>%
    dplyr::filter(country_code %in% c(cc, "DEFAULT")) %>%
    dplyr::arrange(dplyr::desc(country_code == cc)) %>%
    dplyr::slice(1)
  
  row <- dplyr::select(row, -country_code)
  as.list(row)
}

# ------------------------------------------------------------------------
# 10. Year-filter configuration for regressions (by country and component)
# ------------------------------------------------------------------------
# Controls which tender years enter different parts of the pipeline.

year_filter_config <- tibble::tribble(
  ~component, ~country_code, ~min_year, ~max_year,
  
  # default catch-all (no explicit filtering)
  "default",  "BG",          NA,       NA,
  "default",  "UY",          NA,       NA,
  "default",  "ID",          NA,       NA,
  
  # component-specific overrides for single bidding
  "singleb",  "BG",          2011,     2018,
  "singleb",  "UY",          2014,     NA,
  "singleb",  "ID",          2012,     2018
)

# convenient helper: returns list(min_year, max_year)
get_year_range <- function(country_code,
                           component = c("singleb", "default")) {
  component <- match.arg(component)
  cc        <- toupper(country_code)
  
  # 1) component-specific rule
  row_spec <- year_filter_config %>%
    dplyr::filter(component == !!component, country_code == !!cc) %>%
    dplyr::slice_head(n = 1)
  
  # 2) fall back to default rule for that country
  if (nrow(row_spec) == 0) {
    row_spec <- year_filter_config %>%
      dplyr::filter(component == "default", country_code == !!cc) %>%
      dplyr::slice_head(n = 1)
  }
  
  # 3) if still nothing, no filtering
  if (nrow(row_spec) == 0) {
    return(list(min_year = -Inf, max_year = Inf))
  }
  
  min_y <- if (is.na(row_spec$min_year)) -Inf else row_spec$min_year
  max_y <- if (is.na(row_spec$max_year))  Inf else row_spec$max_year
  
  list(min_year = min_y, max_year = max_y)
}

# ========================================================================
# Unified administrative efficiency pipeline
# ========================================================================

run_admin_efficiency_pipeline <- function(df, country_code = "GEN", output_dir) {
  message("Running administrative efficiency pipeline for ", country_code, " ...")
  
  # country-specific thresholds
  thr <- get_admin_thresholds(country_code)
  
  # ensure tender_year is available
  df <- df %>% add_tender_year()
  
  # year range for single-bidding regressions (short/long)
  yr_singleb <- get_year_range(country_code, component = "singleb")
  min_year_singleb <- yr_singleb$min_year
  max_year_singleb <- yr_singleb$max_year
  
  # ----------------------------------------------------------------------
  # SUMMARY STATS BLOCK (printed once at the beginning)
  # ----------------------------------------------------------------------
  # 1) Number of contracts per year
  n_obs_per_year <- df %>%
    dplyr::count(tender_year, name = "n_observations")
  
  # 2) Number of unique buyers(using buyer_masterid)
  n_unique_buyers <- if ("buyer_masterid" %in% names(df)) {
    dplyr::n_distinct(df$buyer_masterid, na.rm = TRUE)
  } else {
    NA_integer_
  }
  
  # 3) Number of unique tenders per year
  tender_year_tenders <- if ("tender_id" %in% names(df)) {
    df %>%
      dplyr::group_by(tender_year) %>%
      dplyr::summarise(
        n_unique_tender_id = dplyr::n_distinct(tender_id),
        .groups = "drop"
      )
  } else {
    NULL
  }
  
  # 4) Number of unique bidders (using bidder_masterid)
  n_unique_bidders <- if ("bidder_masterid" %in% names(df)) {
    dplyr::n_distinct(df$bidder_masterid, na.rm = TRUE)
  } else {
    NA_integer_
  }
  
  # 5) List of variables present (excluding ind_* variables)
  vars_present <- names(df)
  vars_present <- vars_present[!startsWith(vars_present, "ind_")]
  
  summary_stats <- list(
    n_obs_per_year      = n_obs_per_year,
    n_unique_buyers     = n_unique_buyers,
    tender_year_tenders = tender_year_tenders,
    n_unique_bidders    = n_unique_bidders,
    vars_present        = vars_present
  )
  
  cat("---------- DATA SUMMARY FOR", country_code, "----------\n\n")
  
  cat("Contracts per year:\n")
  print(n_obs_per_year)
  cat("\n")
  
  if (!is.na(n_unique_buyers)) {
    cat("Number of unique buyers (buyer_masterid): ", n_unique_buyers, "\n\n")
  } else {
    cat("Column 'buyer_masterid' not found: cannot compute number of unique buyers.\n\n")
  }
  
  if (!is.null(tender_year_tenders)) {
    cat("Number of unique tenders per tender_year:\n")
    print(tender_year_tenders)
    cat("\n")
  } else {
    cat("Column 'tender_id' not found: cannot compute unique tenders per year.\n\n")
  }
  
  if (!is.na(n_unique_bidders)) {
    cat("Number of unique bidders (bidder_masterid): ", n_unique_bidders, "\n\n")
  } else {
    cat("Column 'bidder_masterid' not found: cannot compute number of unique bidders.\n\n")
  }
  
  cat("Variables present (excluding indicators):\n")
  print(vars_present)
  cat("\n-----------------------------------------------\n\n")
  
  
  # ----------------------------------------------------------------------
  # A) Share of procedure types by contract value and count
  # ----------------------------------------------------------------------
  proc_share_data <- build_proc_share_data(df)
  
  sh      <- plot_proc_share_value(proc_share_data)
  p_count <- plot_proc_share_count(proc_share_data)
  
  combined_proc <- sh + p_count + patchwork::plot_layout(ncol = 2)
  
  ggplot2::ggsave(
    filename = file.path(output_dir, "share_value_vs_contracts.png"),
    plot     = combined_proc,
    width    = 19,
    height   = 8,
    dpi      = 300
  )
  
  # ----------------------------------------------------------------------
  # B) Days between call for tender and bid deadline
  # ----------------------------------------------------------------------
  tender_periods_open <- compute_tender_days(
    df,
    tender_publications_firstcallfortenderdate,
    tender_biddeadline,
    tender_days_open
  )
  
  subm <- plot_days_hist_with_quartiles(
    data      = tender_periods_open,
    days_var  = "tender_days_open",
    facet_var = NULL,
    title     = "Days for bid submission",
    x_lab     = "Days between call opening and bid submission deadline",
    caption   = "Vertical lines indicate the 25th, 50th (median), and 75th percentiles (quartiles)"
  )
  
  ggplot2::ggsave(
    filename = file.path(output_dir, "subm.png"),
    plot     = subm,
    width    = 10,
    height   = 6,
    dpi      = 300
  )
  
  # ----------------------------------------------------------------------
  # C) Days between call and bid deadline by procedure type
  # ----------------------------------------------------------------------
  tender_periods_open_proc <- tender_periods_open %>%
    dplyr::mutate(
      tender_proceduretype = recode_procedure_type(tender_proceduretype)
    ) %>%
    tidyr::drop_na(tender_proceduretype)
  
  subm_proc_facet_q <- plot_days_hist_with_quartiles(
    data      = tender_periods_open_proc,
    days_var  = "tender_days_open",
    facet_var = "tender_proceduretype",
    title     = "Days for bid submission by procedure type",
    x_lab     = "Days between call opening and bid submission deadline",
    caption   = "Blue lines indicate the 25th, 50th (median), and 75th percentiles (quartiles) within each procedure type"
  )
  
  ggplot2::ggsave(
    filename = file.path(output_dir, "subm_proc_fac.png"),
    plot     = subm_proc_facet_q,
    width    = 10,
    height   = 6,
    dpi      = 300
  )
  
  # ----------------------------------------------------------------------
  # D) Too short submission periods (overall, by buyer, by value)
  # ----------------------------------------------------------------------
  tender_periods_short <- tender_periods_open_proc %>%
    dplyr::filter(
      tender_proceduretype %in% c(
        "Open Procedure",
        "Restricted Procedure",
        "Negotiated with publications"
      )
    ) %>%
    add_short_deadline_flags(
      days_col = tender_days_open,
      proc_col = tender_proceduretype,
      thr      = thr
    )
  
  
  # Distribution with coloured bars
  subm_r <- ggplot2::ggplot(
    tender_periods_short,
    ggplot2::aes(
      x = tender_days_open,
      fill = dplyr::case_when(
        short_deadline  ~ "red",
        medium_deadline ~ "yellow",
        TRUE            ~ "lightblue"
      )
    )
  ) +
    ggplot2::geom_histogram(
      binwidth = 1,
      boundary = 0,
      colour   = "white"
    ) +
    ggplot2::facet_wrap(~ tender_proceduretype, scales = "free_y") +
    ggplot2::scale_fill_identity() +
    ggplot2::xlim(0, 60) +
    ggplot2::labs(
      x        = "Days taken for the decision",
      y        = "Number of tenders",
      title    = "Distribution of tender open periods by procedure type",
      subtitle = "Bars highlighted: red = short deadline; yellow = medium deadline"
    ) +
    ggplot2::theme_minimal(base_size = 14) +
    ggplot2::theme(legend.position = "none")
  
  share_labels_short <- tender_periods_short %>%
    dplyr::group_by(tender_proceduretype) %>%
    dplyr::summarise(
      share_short = mean(short_deadline, na.rm = TRUE) * 100,
      .groups     = "drop"
    )
  
  subm_r <- subm_r +
    ggplot2::geom_text(
      data = share_labels_short,
      ggplot2::aes(
        x = 50,
        y = Inf,
        label = paste0(
          "Share of contracts\nwith short deadlines: ",
          round(share_short, 1), "%"
        )
      ),
      vjust        = 2,
      hjust        = 1,
      size         = 4.5,
      fontface     = "bold",
      inherit.aes  = FALSE
    )
  
  ggplot2::ggsave(
    filename = file.path(output_dir, "subm_r.png"),
    plot     = subm_r,
    width    = 10,
    height   = 6,
    dpi      = 300
  )
  
  # --- Shares by buyers (number & value) --------------------------------
  tender_periods_buyer <- tender_periods_short %>%
    dplyr::mutate(
      buyer_group = add_buyer_group(buyer_buyertype)
    )
  
  # 1) by number of contracts
  short_share_buyer_proc <- tender_periods_buyer %>%
    dplyr::group_by(buyer_group, tender_proceduretype) %>%
    dplyr::summarise(
      share_short = mean(short_deadline, na.rm = TRUE),
      n_tenders   = dplyr::n(),
      .groups     = "drop"
    ) %>%
    dplyr::mutate(share_other = 1 - share_short) %>%
    tidyr::pivot_longer(
      cols      = c(share_short, share_other),
      names_to  = "deadline_type",
      values_to = "share"
    )
  
  buyer_short <- ggplot2::ggplot(
    short_share_buyer_proc,
    ggplot2::aes(
      x    = buyer_group,
      y    = share,
      fill = deadline_type
    )
  ) +
    ggplot2::geom_col(position = "fill") +
    ggplot2::geom_text(
      ggplot2::aes(label = scales::percent(share, accuracy = 1)),
      position = ggplot2::position_fill(vjust = 0.5),
      color    = "white",
      size     = 4,
      fontface = "bold"
    ) +
    ggplot2::scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
    ggplot2::scale_fill_manual(
      values = c("share_short" = "tomato2", "share_other" = "steelblue2"),
      breaks = c("share_short", "share_other"),
      labels = c("Short submission period", "Other submission periods")
    ) +
    ggplot2::facet_wrap(~ tender_proceduretype) +
    ggplot2::labs(
      x     = "Buyer group",
      y     = "Share of tenders (100%)",
      fill  = NULL,
      title = "Short tender submission periods\n(calculated in number of contracts)"
    ) +
    ggplot2::theme_minimal(base_size = 14) +
    ggplot2::theme(
      axis.text.x    = ggplot2::element_text(angle = 45, hjust = 1),
      legend.position = "top"
    )
  
  
  # 2) by value of contracts
  short_share_value_buyer_proc <- tender_periods_buyer %>%
    dplyr::group_by(buyer_group, tender_proceduretype) %>%
    dplyr::summarise(
      total_value = sum(bid_priceusd, na.rm = TRUE),
      short_value = sum(bid_priceusd[short_deadline %in% TRUE], na.rm = TRUE),
      share_short = dplyr::if_else(total_value > 0, short_value / total_value, NA_real_),
      n_contracts = dplyr::n(),
      .groups     = "drop"
    ) %>%
    dplyr::mutate(share_other = 1 - share_short) %>%
    tidyr::pivot_longer(
      cols      = c(share_short, share_other),
      names_to  = "deadline_type",
      values_to = "share"
    )
  
  buyer_short_v <- ggplot2::ggplot(
    short_share_value_buyer_proc,
    ggplot2::aes(
      x    = buyer_group,
      y    = share,
      fill = deadline_type
    )
  ) +
    ggplot2::geom_col(position = "fill") +
    ggplot2::geom_text(
      ggplot2::aes(label = scales::percent(share, accuracy = 1)),
      position = ggplot2::position_fill(vjust = 0.5),
      color    = "white",
      size     = 4,
      fontface = "bold"
    ) +
    ggplot2::scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
    scale_fill_manual(
      values = c("share_short" = "tomato2", "share_other" = "steelblue2"),
      breaks = c("share_short", "share_other"),
      labels = c("Short submission period", "Other submission periods")
    )+
    ggplot2::facet_wrap(~ tender_proceduretype) +
    ggplot2::labs(
      x     = "Buyer group",
      y     = "Share of contract value (100%)",
      fill  = NULL,
      title = "Short tender submission periods\n(calculated in value of contracts)"
    ) +
    ggplot2::theme_minimal(base_size = 14) +
    ggplot2::theme(
      axis.text.x    = ggplot2::element_text(angle = 45, hjust = 1),
      legend.position = "top"
    )
  
  combined_short_buyer <- buyer_short + buyer_short_v +
    patchwork::plot_layout(nrow = 2)
  
  ggplot2::ggsave(
    filename = file.path(output_dir, "short_submission_buyer.png"),
    plot     = combined_short_buyer,
    width    = 12,
    height   = 12,
    dpi      = 300
  )
  
  # ----------------------------------------------------------------------
  # E) Days between submission deadline and award/signature date
  # (uses contract signature if available, otherwise award decision)
  # ----------------------------------------------------------------------
  # Check if tender_contractsignaturedate exists, if not create it as NA
  if (!"tender_contractsignaturedate" %in% names(df)) {
    df <- df %>%
      dplyr::mutate(tender_contractsignaturedate = as.Date(NA))
  }
  
  df_with_end_date <- df %>%
    dplyr::mutate(
      decision_end_date = dplyr::coalesce(
        as.Date(tender_contractsignaturedate),
        as.Date(tender_awarddecisiondate)
      )
    )
  
  tender_periods_dec <- compute_tender_days(
    df_with_end_date,
    tender_biddeadline,
    decision_end_date,
    tender_days_dec
  )
  
  decp <- plot_days_hist_with_quartiles(
    data      = tender_periods_dec,
    days_var  = "tender_days_dec",
    facet_var = NULL,
    title     = "Days for award decision",
    x_lab     = "Days between bid submission deadline and contract award",
    caption   = "Vertical lines indicate the 25th, 50th (median), and 75th percentiles (quartiles)"
  )
  
  ggplot2::ggsave(
    filename = file.path(output_dir, "decp.png"),
    plot     = decp,
    width    = 10,
    height   = 6,
    dpi      = 300
  )
  
  # ----------------------------------------------------------------------
  # F) Days between submission and award date by procedure type
  # ----------------------------------------------------------------------
  tender_periods_dec_proc <- tender_periods_dec %>%
    dplyr::mutate(
      tender_proceduretype = recode_procedure_type(tender_proceduretype)
    ) %>%
    tidyr::drop_na(tender_proceduretype)
  
  decp_proc_facet_q <- plot_days_hist_with_quartiles(
    data      = tender_periods_dec_proc,
    days_var  = "tender_days_dec",
    facet_var = "tender_proceduretype",
    title     = "Days for award decision",
    x_lab     = "Days between bid submission deadline and contract award",
    caption   = "Blue lines indicate the 25th, 50th (median), and 75th percentiles (quartiles) within each procedure type"
  )
  
  ggplot2::ggsave(
    filename = file.path(output_dir, "decp_proc_fac.png"),
    plot     = decp_proc_facet_q,
    width    = 10,
    height   = 6,
    dpi      = 300
  )
  
  # ----------------------------------------------------------------------
  # G) “Too long” periods by procedure (long-decision threshold)
  # ----------------------------------------------------------------------
  # Mirrors the original logic: long periods are defined on the call-to-bid
  # interval, using the same 56-day style threshold.
  # ----- Effective long threshold for the "too long" descriptive block -----
  long_threshold_open <- if (is.na(thr$long_decision_days)) {
    tender_periods_open_proc %>%
      dplyr::filter(
        tender_proceduretype %in% c(
          "Open Procedure",
          "Restricted Procedure",
          "Negotiated with publications"
        )
      ) %>%
      dplyr::summarise(m = stats::median(tender_days_open, na.rm = TRUE)) %>%
      dplyr::pull(m)
  } else {
    thr$long_decision_days
  }
  
  
  thr_long_open <- thr
  thr_long_open$long_decision_days <- long_threshold_open
  
  
  tender_periods_long <- tender_periods_open_proc %>%
    dplyr::filter(
      tender_proceduretype %in% c(
        "Open Procedure",
        "Restricted Procedure",
        "Negotiated with publications"
      )
    ) %>%
    add_long_decision_flag(
      days_col = tender_days_open,
      proc_col = tender_proceduretype,
      thr      = thr_long_open
    )
  
  long_thr_label_ge <- paste0("≥ ", round(long_threshold_open), " days")
  long_thr_label_lt <- paste0("< ",  round(long_threshold_open), " days")
  
  
  decp_r <- ggplot2::ggplot(
    tender_periods_long,
    ggplot2::aes(
      x = tender_days_open,
      fill = dplyr::case_when(
        tender_proceduretype %in% c("Open Procedure", "Restricted Procedure") &
          tender_days_open >= long_threshold_open ~ "red",
        TRUE ~ "lightblue"
      )
    )
  ) +
    ggplot2::geom_histogram(
      binwidth = 4,
      boundary = 0,
      colour   = "white"
    ) +
    ggplot2::facet_wrap(~ tender_proceduretype, scales = "free_y") +
    ggplot2::scale_fill_identity() +
    ggplot2::xlim(0, 300) +
    ggplot2::labs(
      x        = "Days between bid submission deadline and contract award",
      y        = "Number of tenders",
      title    = "Distribution of tender decision periods by procedure type",
      subtitle = paste0(
        "Bars highlighted in red: periods ", long_thr_label_ge,
        " (country-specific long threshold)"
      )
    ) +
    ggplot2::theme_minimal(base_size = 14) +
    ggplot2::theme(legend.position = "none")
  
  
  share_labels_long <- tender_periods_long %>%
    dplyr::group_by(tender_proceduretype) %>%
    dplyr::summarise(
      share_long = mean(tender_days_open >= long_threshold_open, na.rm = TRUE) * 100,
      .groups    = "drop"
    )
  
  decp_r <- decp_r +
    ggplot2::geom_text(
      data = share_labels_long,
      ggplot2::aes(
        x = 200,
        y = Inf,
        label = paste0(
          "Share of contracts\nwith delayed decision: ",
          round(share_long, 1), "%"
        )
      ),
      vjust       = 2,
      hjust       = 0.75,
      size        = 4.5,
      fontface    = "bold",
      inherit.aes = FALSE
    )
  
  
  ggplot2::ggsave(
    filename = file.path(output_dir, "decp_r.png"),
    plot     = decp_r,
    width    = 10,
    height   = 6,
    dpi      = 300
  )
  
  # ----------------------------------------------------------------------
  # H) Share of delayed decisions by buyer type (number & value)
  # ----------------------------------------------------------------------
  tender_periods_labeled_dec <- tender_periods_long %>%
    dplyr::mutate(
      buyer_group = add_buyer_group(buyer_buyertype)
    )
  
  # 1) by number of contracts
  long_share_buyer_proc <- tender_periods_labeled_dec %>%
    dplyr::group_by(buyer_group, tender_proceduretype) %>%
    dplyr::summarise(
      share_long = mean(long_decision, na.rm = TRUE),
      n_tenders  = dplyr::n(),
      .groups    = "drop"
    ) %>%
    dplyr::mutate(share_other = 1 - share_long) %>%
    tidyr::pivot_longer(
      cols      = c(share_long, share_other),
      names_to  = "decision_type",
      values_to = "share"
    )
  
  buyer_long <- ggplot2::ggplot(
    long_share_buyer_proc,
    ggplot2::aes(
      x    = buyer_group,
      y    = share,
      fill = decision_type
    )
  ) +
    ggplot2::geom_col(position = "fill") +
    ggplot2::geom_text(
      ggplot2::aes(label = scales::percent(share, accuracy = 1)),
      position = ggplot2::position_fill(vjust = 0.5),
      color    = "white",
      size     = 4,
      fontface = "bold"
    ) +
    ggplot2::scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
    ggplot2::scale_fill_manual(
      values = c("share_long" = "tomato2", "share_other" = "steelblue2"),
      breaks = c("share_long", "share_other"),
      labels = c(long_thr_label_ge, long_thr_label_lt)
    ) +
    ggplot2::facet_wrap(~ tender_proceduretype) +
    ggplot2::labs(
      x     = "Buyer group",
      y     = "Share of tenders (100%)",
      fill  = NULL,
      title = paste0(
        "Long tender decision periods (", long_thr_label_ge, ")\n",
        "(calculated in number of contracts)"
      )
    ) +
    ggplot2::theme_minimal(base_size = 14) +
    ggplot2::theme(
      axis.text.x    = ggplot2::element_text(angle = 45, hjust = 1),
      legend.position = "top"
    )
  
  
  
  # 2) by value of contracts
  long_share_value_buyer_proc <- tender_periods_labeled_dec %>%
    dplyr::group_by(buyer_group, tender_proceduretype) %>%
    dplyr::summarise(
      total_value = sum(bid_priceusd, na.rm = TRUE),
      long_value  = sum(bid_priceusd[long_decision %in% TRUE], na.rm = TRUE),
      share_long  = dplyr::if_else(total_value > 0, long_value / total_value, NA_real_),
      n_contracts = dplyr::n(),
      .groups     = "drop"
    ) %>%
    dplyr::mutate(share_other = 1 - share_long) %>%
    tidyr::pivot_longer(
      cols      = c(share_long, share_other),
      names_to  = "decision_type",
      values_to = "share"
    )
  
  buyer_long_v <- ggplot2::ggplot(
    long_share_value_buyer_proc,
    ggplot2::aes(
      x    = buyer_group,
      y    = share,
      fill = decision_type
    )
  ) +
    ggplot2::geom_col(position = "fill") +
    ggplot2::geom_text(
      ggplot2::aes(label = scales::percent(share, accuracy = 1)),
      position = ggplot2::position_fill(vjust = 0.5),
      color    = "white",
      size     = 4,
      fontface = "bold"
    ) +
    ggplot2::scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
    ggplot2::scale_fill_manual(
      values = c("share_long" = "tomato2", "share_other" = "steelblue2"),
      breaks = c("share_long", "share_other"),
      labels = c(long_thr_label_ge, long_thr_label_lt)
    ) +
    ggplot2::facet_wrap(~ tender_proceduretype) +
    ggplot2::labs(
      x     = "Buyer group",
      y     = "Share of contract value (100%)",
      fill  = NULL,
      title = paste0(
        "Long tender decision periods (", long_thr_label_ge, ")\n",
        "(calculated in value of contracts)"
      )
    ) +
    ggplot2::theme_minimal(base_size = 14) +
    ggplot2::theme(
      axis.text.x    = ggplot2::element_text(angle = 45, hjust = 1),
      legend.position = "top"
    )
  
  
  
  combined_dec_plot <- buyer_long + buyer_long_v +
    patchwork::plot_layout(nrow = 2)
  
  ggplot2::ggsave(
    filename = file.path(output_dir, "long_decision_buyer.png"),
    plot     = combined_dec_plot,
    width    = 12,
    height   = 12,
    dpi      = 300
  )
  
  # ----------------------------------------------------------------------
  # I) Effect of shortened period on single bidding (regression WITH SPECIFICATION TESTING)
  # ----------------------------------------------------------------------
  message("\n", strrep("-", 60))
  message("Running specification testing for SHORT submission period...")
  message(strrep("-", 60))
  
  reg_short_base <- df %>%
    dplyr::mutate(
      tender_publications_firstcallfortenderdate =
        as.Date(tender_publications_firstcallfortenderdate),
      tender_biddeadline = as.Date(tender_biddeadline),
      tender_days_open   = as.numeric(
        tender_biddeadline - tender_publications_firstcallfortenderdate
      )
    ) %>%
    add_tender_year() %>%
    dplyr::filter(
      tender_year >= min_year_singleb,
      tender_year <= max_year_singleb,
      !is.na(tender_days_open),
      tender_days_open >= 0,
      tender_days_open < 365,
      tender_proceduretype %in% c("OPEN", "RESTRICTED", "NEGOTIATED_WITH_PUBLICATION")
    )
  
  # Effective short thresholds for regression (median fallback)
  short_open_reg <- if (is.na(thr$subm_short_open)) {
    stats::median(
      reg_short_base$tender_days_open[reg_short_base$tender_proceduretype == "OPEN"],
      na.rm = TRUE
    )
  } else thr$subm_short_open
  
  short_rest_reg <- if (is.na(thr$subm_short_restricted)) {
    stats::median(
      reg_short_base$tender_days_open[reg_short_base$tender_proceduretype == "RESTRICTED"],
      na.rm = TRUE
    )
  } else thr$subm_short_restricted
  
  short_neg_reg <- if (is.null(thr$subm_short_negotiated) ||
                       is.na(thr$subm_short_negotiated)) {
    stats::median(
      reg_short_base$tender_days_open[reg_short_base$tender_proceduretype == "NEGOTIATED_WITH_PUBLICATION"],
      na.rm = TRUE
    )
  } else thr$subm_short_negotiated
  
  reg_short <- reg_short_base %>%
    dplyr::mutate(
      short_submission_period = dplyr::case_when(
        tender_proceduretype == "OPEN" &
          tender_days_open < short_open_reg ~ 1L,
        tender_proceduretype == "RESTRICTED" &
          tender_days_open < short_rest_reg ~ 1L,
        tender_proceduretype == "NEGOTIATED_WITH_PUBLICATION" &
          tender_days_open < short_neg_reg ~ 1L,
        tender_proceduretype %in% c("OPEN", "RESTRICTED", "NEGOTIATED_WITH_PUBLICATION") ~ 0L,
        TRUE ~ NA_integer_
      )
    ) %>%
    dplyr::filter(
      !is.na(short_submission_period),
      !is.na(ind_corr_singleb),
      !is.na(buyer_id)
    ) %>%
    dplyr::mutate(
      ind_corr_binary = ind_corr_singleb / 100
    )
  
  # Run specification testing
  specs_short <- NULL
  sensitivity_short <- NULL
  model_short_glm <- NULL
  plot_short_reg <- NULL
  
  if (nrow(reg_short) > 0) {
    # Run all specifications
    specs_short <- run_short_subm_specs(
      reg_data = reg_short,
      fe_set = c("0", "buyer", "year", "buyer+year"),
      cluster_set = c("none", "buyer", "year", "buyer_year", "buyer_buyertype"),
      controls_set = c("x_only", "base")
    )
    
    # Build sensitivity bundle
    if (!is.null(specs_short) && nrow(specs_short) > 0) {
      sensitivity_short <- build_sensitivity_bundle(specs_short)
      
      # Print sensitivity
      message("\n--- SHORT SUBMISSION PERIOD Sensitivity (", country_code, ") ---")
      print(sensitivity_short$overall)
      print(sensitivity_short$sign)
      print(sensitivity_short$by_fe)
      print(sensitivity_short$by_cluster)
      print(sensitivity_short$by_controls)
      print(sensitivity_short$classes)
      print(sensitivity_short$top_cells)
      
      # Pick best model
      best_row_short <- pick_best_model(
        specs_short,
        require_positive = TRUE,
        p_max = 0.10,
        strength_col = "effect_strength"
      )
      
      if (!is.null(best_row_short)) {
        # Refit best model
        fe_part <- make_fe_part(best_row_short$fe)
        cl_fml <- make_cluster(best_row_short$cluster)
        
        rhs_terms <- switch(
          best_row_short$controls,
          "x_only" = c("short_submission_period"),
          "base" = c("short_submission_period", "buyer_buyertype", "tender_proceduretype")
        )
        rhs_terms <- rhs_terms[rhs_terms %in% names(reg_short)]
        rhs <- paste(rhs_terms, collapse = " + ")
        fml <- stats::as.formula(paste0("ind_corr_binary ~ ", rhs, " | ", fe_part))
        
        model_short_glm <- fixest::feglm(
          fml,
          family = quasibinomial(link = "logit"),
          data = reg_short,
          cluster = cl_fml
        )
        
        pred_short <- tryCatch(
          ggeffects::ggpredict(model_short_glm, terms = "short_submission_period"),
          error = function(e) NULL
        )
        
        if (!is.null(pred_short)) {
          n_short <- nrow(reg_short)
          min_y_s <- min(reg_short$tender_year, na.rm = TRUE)
          max_y_s <- max(reg_short$tender_year, na.rm = TRUE)
          caption_short <- paste0(
            "Sample: N = ", n_short,
            " tenders; years covered: ", min_y_s, "–", max_y_s,
            ". BEST model: FE=", best_row_short$fe,
            ", Cluster=", best_row_short$cluster,
            ", Controls=", best_row_short$controls
          )
          
          plot_short_reg <- ggplot2::ggplot(
            pred_short,
            ggplot2::aes(x = x, y = predicted)
          ) +
            ggplot2::geom_line(size = 1.5, color = "lightblue") +
            ggplot2::geom_ribbon(
              ggplot2::aes(ymin = conf.low, ymax = conf.high),
              alpha = 0.2
            ) +
            ggplot2::labs(
              title    = "Predicted probability of single bidding\nby short submission period",
              subtitle = "(BEST Model Selected from Specification Testing)",
              x        = "Short submission period (0 = normal, 1 = short)",
              y        = "Predicted probability",
              caption  = caption_short
            ) +
            ggplot2::scale_y_continuous(labels = scales::percent_format()) +
            ggplot2::theme_minimal(base_size = 20)
        }
      }
    }
  } else {
    message("No valid data for short submission period regression")
  }
  
  
  # ----------------------------------------------------------------------
  # J) Effect of long period on single bidding (regression WITH SPECIFICATION TESTING)
  # ----------------------------------------------------------------------
  message("\n", strrep("-", 60))
  message("Running specification testing for LONG decision period...")
  message(strrep("-", 60))
  
  reg_long_base <- df %>%
    dplyr::mutate(
      tender_publications_firstcallfortenderdate =
        as.Date(tender_publications_firstcallfortenderdate),
      tender_biddeadline = as.Date(tender_biddeadline),
      tender_days_dec    = as.numeric(
        tender_biddeadline - tender_publications_firstcallfortenderdate
      )
    ) %>%
    add_tender_year() %>%
    dplyr::filter(
      tender_year >= min_year_singleb,
      tender_year <= max_year_singleb,
      tender_days_dec >= 0,
      tender_days_dec < 365,
      tender_proceduretype %in% c("OPEN", "RESTRICTED", "NEGOTIATED_WITH_PUBLICATION")
    )
  
  # Effective long threshold for regression (median fallback)
  long_threshold_dec <- if (is.na(thr$long_decision_days)) {
    stats::median(reg_long_base$tender_days_dec, na.rm = TRUE)
  } else thr$long_decision_days
  
  reg_long <- reg_long_base %>%
    dplyr::mutate(
      long_decision_period = dplyr::case_when(
        !is.na(tender_days_dec) &
          tender_days_dec >= 0 & tender_days_dec < 365 &
          tender_days_dec >= long_threshold_dec ~ 1L,
        !is.na(tender_days_dec) &
          tender_days_dec >= 0 & tender_days_dec < 365 &
          tender_days_dec < long_threshold_dec ~ 0L,
        TRUE ~ NA_integer_
      )
    ) %>%
    dplyr::filter(
      !is.na(long_decision_period),
      !is.na(ind_corr_singleb),
      !is.na(buyer_id)
    ) %>%
    dplyr::mutate(
      ind_corr_binary = ind_corr_singleb / 100
    )
  
  # Run specification testing
  specs_long <- NULL
  sensitivity_long <- NULL
  model_long_glm <- NULL
  plot_long_reg <- NULL
  
  if (nrow(reg_long) > 0) {
    # Run all specifications
    specs_long <- run_long_dec_specs(
      reg_data = reg_long,
      fe_set = c("0", "buyer", "year", "buyer+year"),
      cluster_set = c("none", "buyer", "year", "buyer_year", "buyer_buyertype"),
      controls_set = c("x_only", "base")
    )
    
    # Build sensitivity bundle
    if (!is.null(specs_long) && nrow(specs_long) > 0) {
      sensitivity_long <- build_sensitivity_bundle(specs_long)
      
      # Print sensitivity
      message("\n--- LONG DECISION PERIOD Sensitivity (", country_code, ") ---")
      print(sensitivity_long$overall)
      print(sensitivity_long$sign)
      print(sensitivity_long$by_fe)
      print(sensitivity_long$by_cluster)
      print(sensitivity_long$by_controls)
      print(sensitivity_long$classes)
      print(sensitivity_long$top_cells)
      
      # Pick best model
      best_row_long <- pick_best_model(
        specs_long,
        require_positive = TRUE,
        p_max = 0.10,
        strength_col = "effect_strength"
      )
      
      if (!is.null(best_row_long)) {
        # Refit best model
        fe_part <- make_fe_part(best_row_long$fe)
        cl_fml <- make_cluster(best_row_long$cluster)
        
        rhs_terms <- switch(
          best_row_long$controls,
          "x_only" = c("long_decision_period"),
          "base" = c("long_decision_period", "buyer_buyertype", "tender_proceduretype")
        )
        rhs_terms <- rhs_terms[rhs_terms %in% names(reg_long)]
        rhs <- paste(rhs_terms, collapse = " + ")
        fml <- stats::as.formula(paste0("ind_corr_binary ~ ", rhs, " | ", fe_part))
        
        model_long_glm <- fixest::feglm(
          fml,
          family = quasibinomial(link = "logit"),
          data = reg_long,
          cluster = cl_fml
        )
        
        pred_long <- tryCatch(
          ggeffects::ggpredict(model_long_glm, terms = "long_decision_period"),
          error = function(e) NULL
        )
        
        if (!is.null(pred_long)) {
          n_long  <- nrow(reg_long)
          min_y_l <- min(reg_long$tender_year, na.rm = TRUE)
          max_y_l <- max(reg_long$tender_year, na.rm = TRUE)
          caption_long <- paste0(
            "Sample: N = ", n_long,
            " tenders; years covered: ", min_y_l, "–", max_y_l,
            ". BEST model: FE=", best_row_long$fe,
            ", Cluster=", best_row_long$cluster,
            ", Controls=", best_row_long$controls
          )
          
          x_label_long <- "Long decision period (0 = normal, 1 = too long)"
          
          plot_long_reg <- ggplot2::ggplot(
            pred_long,
            ggplot2::aes(x = x, y = predicted)
          ) +
            ggplot2::geom_line(size = 1.5, color = "lightblue") +
            ggplot2::geom_ribbon(
              ggplot2::aes(ymin = conf.low, ymax = conf.high),
              alpha = 0.2
            ) +
            ggplot2::labs(
              title    = "Predicted probability of single bidding\nby length of decision period",
              subtitle = "(BEST Model Selected from Specification Testing)",
              x        = x_label_long,
              y        = "Predicted probability",
              caption  = caption_long
            ) +
            ggplot2::theme_minimal(base_size = 20) +
            ggplot2::scale_y_continuous(labels = scales::percent_format())
        }
      }
    }
  } else {
    message("No valid data for long decision period regression")
  }
  
  # ----------------------------------------------------------------------
  # K) Collect outputs and return
  # ----------------------------------------------------------------------
  results <- list(
    country_code               = country_code,
    data                       = df,
    thresholds                 = thr,
    proc_share_data            = proc_share_data,
    tender_periods_open        = tender_periods_open,
    tender_periods_open_proc   = tender_periods_open_proc,
    tender_periods_short       = tender_periods_short,
    tender_periods_dec         = tender_periods_dec,
    tender_periods_dec_proc    = tender_periods_dec_proc,
    tender_periods_long        = tender_periods_long,
    tender_periods_labeled_dec = tender_periods_labeled_dec,
    
    # plots
    sh                   = sh,
    p_count              = p_count,
    combined_proc        = combined_proc,
    subm                 = subm,
    subm_proc_facet_q    = subm_proc_facet_q,
    subm_r               = subm_r,
    buyer_short          = buyer_short,
    buyer_short_v        = buyer_short_v,
    combined_short_buyer = combined_short_buyer,
    decp                 = decp,
    decp_proc_facet_q    = decp_proc_facet_q,
    decp_r               = decp_r,
    buyer_long           = buyer_long,
    buyer_long_v         = buyer_long_v,
    combined_dec_plot    = combined_dec_plot,
    plot_short_reg       = plot_short_reg,
    plot_long_reg        = plot_long_reg,
    
    # models
    model_short_glm      = model_short_glm,
    model_long_glm       = model_long_glm,
    
    # specification testing results
    specs_short          = specs_short,
    sensitivity_short    = sensitivity_short,
    specs_long           = specs_long,
    sensitivity_long     = sensitivity_long,
    
    summary_stats        = summary_stats
  )
  
  invisible(results)
}

Administrative Efficiency Report

Viktoriia Poltoratskaia

2026-01-22

Is there an overuse of some procedure types?

Regulatory overview of procedure type thresholds and conditions

Country-specific results

Bulgaria (BG)

Is there an overuse of some procedure types?

Procedure type distribution – BG

Are submission periods too short?

Overall submission period distribution – BG

Submission periods by procedure type – BG

Short vs. normal submission periods – BG

Which buyers set the shortest submission periods?

Short submission periods by buyer group – BG

Are decision periods too long?

Overall decision period distribution – BG

Decision periods by procedure type – BG

Long vs. normal decision periods – BG

Which buyers have the longest decision periods?

Long decision periods by buyer group – BG

Is administrative efficiency linked to competition?

Effect of short submission periods on single bidding – BG

Sensitivity Analysis: Short Submission Period Model – BG

Effect of long decision periods on single bidding – BG

Sensitivity Analysis: Long Decision Period Model – BG

Uruguay (UY)

Is there an overuse of some procedure types?

Procedure type distribution – UY

Are submission periods too short?

Overall submission period distribution – UY

Submission periods by procedure type – UY

Short vs. normal submission periods – UY

Which buyers set the shortest submission periods?

Short submission periods by buyer group – UY

Are decision periods too long?

Overall decision period distribution – UY

Decision periods by procedure type – UY

Long vs. normal decision periods – UY

Which buyers have the longest decision periods?

Long decision periods by buyer group – UY

Is administrative efficiency linked to competition?

Effect of short submission periods on single bidding – UY

Sensitivity Analysis: Short Submission Period Model – UY

Effect of long decision periods on single bidding – UY

Sensitivity Analysis: Long Decision Period Model – UY