Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: split changelogs #2868

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
13 changes: 13 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -437,6 +437,19 @@ version-sync:
scripts/version-sync.sh \
-f "unstructured/__version__.py" semver

## dev-changelog file creation and version updates for dev PRs:
.PHONY: changelog-version
changelog-version:
./scripts/changelogs/create_dev_changelog_from_branch_name.sh
python scripts/changelogs/version.py dev

## changelog and version updates for release PRs:
.PHONY: changelog-version-release
changelog-version-release:
python scripts/changelogs/combine.py changelogs-dev CHANGELOG.md
./scripts/changelogs/cleanup.sh changelogs-dev
python scripts/changelogs/version.py release

.PHONY: check-coverage
check-coverage:
coverage report --fail-under=95
Expand Down
23 changes: 23 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,29 @@ information on how to report security vulnerabilities.

Encountered a bug? Please create a new [GitHub issue](https://github.com/Unstructured-IO/unstructured/issues/new/choose) and use our bug report template to describe the problem. To help us diagnose the issue, use the `python scripts/collect_env.py` command to gather your system's environment information and include it in your report. Your assistance helps us continuously improve our software - thank you!

## :gear: Contributing

### Versioning/Changelog Guidelines

#### How to make a dev PR:
- run `make changelog-version`
- describe PR changes within the auto created dev changelog file (which is inside the `changelogs-dev` folder)
- verify that:
- a new dev changelog (.md) file is created in `changelogs-dev` folder with your branch name
- the file should follow the standard template: `changelogs-dev/dev-changelog-template.md`
- `CHANGELOG.md` is untouched
- if the PR is the first dev PR after a release PR:
- version number inside `__version.py__` is incremented and a `-dev` suffix is added
- for any other dev PR:
`__version.py__` is untouched

#### How to make a release PR:
- run `make changelog-version-release`
- verify that:
- contents inside `changelogs-dev` folder are combined into `CHANGELOG.md` without repetition or loss; modify manually if needed.
- `changelogs-dev` folder is cleaned / empty except for `dev-changelog-template.md`
- `-dev` suffix inside `__version.py__` is removed

## :books: Learn more

| Section | Description |
Expand Down
6 changes: 6 additions & 0 deletions changelogs-dev/ahmet-split-dev-changelogs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
### Enhancements
- Decoupled dev changelogs for each PR to be able to use merge queues with PRs that have conflicting changelogs.

### Features

### Fixes
5 changes: 5 additions & 0 deletions changelogs-dev/dev-changelog-template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
### Enhancements

### Features

### Fixes
15 changes: 15 additions & 0 deletions scripts/changelogs/cleanup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/env bash

# For each dev commit to the main branch, there should be an individual changelog file
# that represents the changes made in that commit. In each release, all of the changelog files
# should be combined into a single file (CHANGELOG.md); then the individual changelog files should be removed.
# This script is used to remove the individual changelog files.
# The script takes the changelogs-dev directory as an argument.

# Allows extended patterns on rm
shopt -s extglob

SCRIPT_DIR=$(dirname "$(realpath "$0")")
PROJECT_DIR=$(dirname $(dirname "$SCRIPT_DIR"))

rm $PROJECT_DIR/$1/!(dev-changelog-template.md)
132 changes: 132 additions & 0 deletions scripts/changelogs/combine.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
import collections
import os
import sys
import warnings
from typing import List

SUBSECTION_TYPES = ["### Enhancements", "### Features", "### Fixes"]


def ensure_changelog_folder_purity(folder_name):
"""Makes sure that the changelogs-dev folder only contains markdown files. This is to be able
to raise an explicit error when an unexpected file is in the folder, rather than failing in
an unexpected way later on."""
for filename in os.listdir(folder_name):
if not filename.endswith(".md"):
raise ValueError(
f"Found non-markdown changelog file named {filename} in the "
f"changelog folder {folder_name}. Please ensure that changelogs "
"are properly formatted."
)


def parse_subsections(
file_path, version_to_be_ensured=None, subsection_types=SUBSECTION_TYPES
) -> dict[str, List[str]]:
"""Parses the subsections of a changelog file, and returns them as a
dictionary. This is to be able to combine data for each subsection separately, from different
dev-changelog files. Check SUBSECTION_TYPES constant to see a list of valid subsections."""

subsections = {}
current_subsection_type = None

with open(file_path) as file:
for line in file:
if any(subsection_type in line for subsection_type in subsection_types):
current_subsection_type = line.strip()
subsections[current_subsection_type] = []
elif line and not line.isspace() and current_subsection_type:
processed_line = line.strip().lstrip("-").lstrip(" ")
subsections[current_subsection_type].append(processed_line)

return subsections


def combine_files(folder_path, release_version) -> dict[str, List[str]]:
"""Combines the subsections of all changelog files in the folder.
This is to be able to have individual changelog files which avoids conflicts in merge queues,
and at the same time to save the burden from manually combining those files for each release"""
ensure_changelog_folder_purity(folder_path)
combined_subsections = {}

# Iterate over files in the folder
for filename in os.listdir(folder_path):
if filename.endswith(".md") and filename != "dev-changelog-template.md":
file_path = os.path.join(folder_path, filename)
file_subsections = parse_subsections(file_path, version_to_be_ensured=release_version)

# Combine subsections from this file into the combined dictionary
for subsection_type, lines in file_subsections.items():

if subsection_type not in combined_subsections:
combined_subsections[subsection_type] = []
combined_subsections[subsection_type].extend(lines)

elif filename != "dev-changelog-template":
warnings.warn(
f"Found a non markdown file named {filename} in the changelogs-dev "
"folder. File will be ignored."
)

return combined_subsections


def serialize_changelog_updates(combined_subsections, release_version):
"""Converts combined subsections dictionary into a markdown string, to be able to update the
CHANGELOG.md file with the combined set of release notes."""
changelog_updates = f"## {release_version}"
for subsection_type, lines in combined_subsections.items():
changelog_updates += f"\n\n{subsection_type}"
for line in lines:
changelog_updates += f"\n- {line}"
changelog_updates += "\n\n"
return changelog_updates


def increment_last_version(change_log_file):
"""Increments the last version number in the CHANGELOG.md file, after a previous release."""
with open(change_log_file) as file:
for line in file:
if line.startswith("## "):
last_version = line.strip().lstrip("## ")
break
last_version_parts = last_version.split(".")
last_version_parts[-1] = str(int(last_version_parts[-1]) + 1)
return ".".join(last_version_parts)


def get_changelog_updates(changelogs_dev_folder, release_version):
"""Combines the subsections of all changelog files in the folder, and returns them as a
markdown string."""
combined_subsections = collections.OrderedDict(
sorted(combine_files(changelogs_dev_folder, release_version).items())
)
changelog_updates = serialize_changelog_updates(combined_subsections, release_version)
return changelog_updates


def update_changelog_file(changelog_md_file_path, changelog_updates):
"""Updates the CHANGELOG.md file with the combined set of release notes."""
with open(changelog_md_file_path) as file:
existing_content = file.read()

with open(changelog_md_file_path, "w") as file:
file.write(changelog_updates + existing_content)


if __name__ == "__main__":
if not len(sys.argv) >= 3:
print(
"Usage: python combine-changelogs.py <changelogs-dev folder path> "
"<CHANGELOG.md file path> "
"<release_version (optional)>"
)
sys.exit(1)

changelogs_dev_folder, changelog_md_file_path, *optional_args = sys.argv[1:]
release_version = (
optional_args[0] if optional_args else increment_last_version(changelog_md_file_path)
)

changelog_updates = get_changelog_updates(changelogs_dev_folder, release_version)
update_changelog_file(changelog_md_file_path, changelog_updates)
10 changes: 10 additions & 0 deletions scripts/changelogs/create_dev_changelog_from_branch_name.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/env bash

# Get the current branch name, replace all slashes with dashes,
# and create a markdown file with the branch name.
# This script is to for create dev changelog files for dev PRs.

branch_name=$(git rev-parse --abbrev-ref HEAD)
modified_branch_name=$(echo "$branch_name" | sed 's/\//-/g')
dev_changelog_name=$modified_branch_name.md
cp changelogs-dev/dev-changelog-template.md "changelogs-dev/$dev_changelog_name"
75 changes: 75 additions & 0 deletions scripts/changelogs/version.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
import sys


def get_last_version(version_file):
"""Reads the last version from the version file."""
with open(version_file) as file:
last_version = file.read().split('"')[1]
return last_version


def last_version_is_dev(last_version):
"""Used to determine if the last version is a development version."""
return "dev" in last_version


def increment_last_release_version(last_version):
"""Used to determine the next version when the last version is a release version."""
last_version_parts = last_version.split(".")
last_version_parts[-1] = str(int(last_version_parts[-1]) + 1)
return ".".join(last_version_parts)


def update_version_file(version_file, version):
"""Updates the version file with the new version."""
with open(version_file, "w") as file:
file.write(f'__version__ = "{version}" # pragma: no cover\n')


def get_next_version(next_version_type, last_version):
if next_version_type == "dev":
if last_version_is_dev(last_version):
# Version file remains the same with a -dev suffix
sys.exit(0)
else:
# We increment the last release version and add a -dev suffix
next_version = increment_last_release_version(last_version) + "-dev"
return next_version

elif next_version_type == "release":
if last_version_is_dev(last_version):
# We remove the -dev suffix
next_version = last_version.split("-")[0]
return next_version

else:
# Two release versions in a row is an edge case, where we cannot be sure what the
# expected behavior is. In this case, we ask for manual intervention.
print(
"You are trying to make a release version when the last version is also a release"
"version. Please handle all file modifications manually."
)
sys.exit(1)

else:
print("Usage: python version.py <next_version_type (dev or release)> ")
sys.exit(1)


if __name__ == "__main__":
"""This script gets the next version type (dev or release) from the user, and makes
the necessary changes to the version file (if any changes are needed)."""

if len(sys.argv) == 1:
print("Usage: python version.py <next version type (dev or release)> ")
sys.exit(1)

OUTPUT_FILE = "unstructured/__version__.py"

# Used to determine if the next version will have a -dev suffix or not,
# and, to determine if we need to increment the version
next_version_type = sys.argv[1]

last_version = get_last_version(OUTPUT_FILE)
next_version = get_next_version(next_version_type, last_version)
update_version_file(OUTPUT_FILE, next_version)
18 changes: 18 additions & 0 deletions test_scripts/test-combine-changelogs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/usr/bin/env bash

# For each dev commit to the main branch, there should be an individual changelog file
# that represents the changes made in that commit. In each release, all of the changelog files
# should be combined into a single file (CHANGELOG.md); then the individual changelog files should be removed.
# This script is used to test combine-changelogs functionality.

set -e

SCRIPT_DIR=$(dirname "$(realpath "$0")")
PROJECT_DIR=$(dirname "$SCRIPT_DIR")
ASSETS_DIR=$SCRIPT_DIR/test_assets/test_combine_changelogs

python $PROJECT_DIR/scripts/changelogs/combine.py $ASSETS_DIR/changelogs-dev $ASSETS_DIR/test_CHANGELOG_do_not_update.md

# Check if the changelog was combined correctly
diff $ASSETS_DIR/test_CHANGELOG_do_not_update.md $ASSETS_DIR/expected_updated_CHANGELOG.md
git checkout $ASSETS_DIR/test_CHANGELOG_do_not_update.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
### Enhancements

### Features

- Test dev changelog file 1 (total) line 7

### Fixes
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
### Enhancements

### Features

- Test dev changelog file 2 (total) line 8
- Test dev changelog file 2 (total) line 9

### Fixes

- Test dev changelog file 2 (total) line 10
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
### Enhancements

### Features

### Fixes
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
### Enhancements

### Features

- Test dev changelog file 4 (total) line 12

### Fixes
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
## 0.0.2

### Enhancements

### Features
- Test dev changelog file 2 (total) line 8
- Test dev changelog file 2 (total) line 9
- Test dev changelog file 4 (total) line 12
- Test dev changelog file 1 (total) line 7

### Fixes
- Test dev changelog file 2 (total) line 10

## 0.0.1

### Enhancements

- Test changelog file line 3

### Features

- Test changelog file line 4
- Test changelog file line 5

### Fixes

- Test changelog file line 6

## 0.0.0

### Enhancements, Features, Fixes

- Test changelog file line 1
- Test changelog file line 2