Skip to content
This repository has been archived by the owner on Jan 2, 2023. It is now read-only.

Contribution Distribution Engine (CDE) #164

Open
kikass13 opened this issue Sep 8, 2020 · 10 comments
Open

Contribution Distribution Engine (CDE) #164

kikass13 opened this issue Sep 8, 2020 · 10 comments
Assignees
Milestone

Comments

@kikass13
Copy link
Contributor

kikass13 commented Sep 8, 2020

after our latest meeting, I was playing around with the future weighting stuff and some random git helper scripts regarding git blame.

I will continue the stuff I have visualized in my newest fancy draft - it's not really representing anything but it should show my intentions and definitions while moving forward.

SORRY FOR SPELLING MISTAKES IN THIS TEXT, IT's 3 am DAMMIT! :D

contribution_types_domains_draft

following definitions were used:

  • Contribution Distribution Engine (CDE)

    • engine for configuring domains, their desired metrics and resulting types
    • will replace/takeover the current gather/weight/split functionalities based on a dynamic / freely configurable framework
  • Contribution Type (probably 1:1 with metrics, but I'm not sure)

    • source of information regarding contributions done
    • metrics will have to classify as one of these types, simple metrics only fall into one of these while complex metrics will need their own type in respect to the information needed
  • Contribution Domain

    • These are just labels for whatever the user wants to be represented by metrics.
    • So domains can be seen as a combination of metrics for a specific purpose (distribution of money to a specific kind of developer type)
    • highly configurable and arbitrary ... the repository owner can pretty much define whatever he wants (weights, metrics, special flags)
    • Because specific metrics have to be "assigned" to a Contribution domain to work, these domains have to reference the contribution type above (where does the metric get it's information from)

how does it work:

  1. ProjectOwner (PO) defines selery.yml by defining contribution domains and their weights. He also has to configure which contribution type/metric should be applied and configure each one individually.
  2. He could for example create a domain called "Documentation" with the "Files" - Type and tell the engine to apply a weight to every contributor who works on *.md files within the repository, lets say 20 people.
  3. After a successful PullRequest, we know each specific contributor to that "Documentations" Domain (we also know how much each one has contributed and when) and have to figure out how all of those contributors (20) are eligible to "earn" money in relation to each other (inside that specific domain). That's where the PO has to define the metrics applied.
  4. For example, our PO could configure a metric which would give the highest payout/probability to the contributors who wrote the most stuff (added the most lines). The metric will therefore change the weights from an equal distribution (1/20 for every contributor) to another distribution based on the lines written by each contributor.
  5. Hurrah! We can now use our new weights to pay out our contributors as normal.
@kikass13
Copy link
Contributor Author

kikass13 commented Sep 8, 2020

Here's my first example of file specific info gathering. I am using git blame to extract all the (hopefully) useful information of local files and their "touches" (aka who changed how many lines at which point in time). My script + git blame outputs the following:

  • a list of files in he git repo (under version control)
    • for each file, it shows who has contributed (Author name for now) and how many lines has he added
      • for each author, the script outputs the amount of line changes at each specific point in time (commit stamp)

heres my example:

git ls-files | while read f; do echo "\n$f"; git blame -CCC --line-porcelain $f | tests/pythonGitHelper.py; done

and here's the output

CLICK ME TO SEE OUTPUT

.git-blame-ignore-revs
Arne Döring [2]
  -- 2020-08-14/16:37:25 [2]

.github/FUNDING.yml
Tobias Augspurger [3]
  -- 2020-08-20/15:19:05 [1]
  -- 2020-08-20/14:56:17 [2]

.github/workflows/black.yml
Arne Döring [11]
  -- 2020-08-14/16:08:49 [1]
  -- 2020-08-14/14:34:01 [10]

.github/workflows/seleryaction.yml
Tobias Augspurger [44]
  -- 2020-02-08/11:23:48 [7]
  -- 2020-08-22/18:35:19 [1]
  -- 2020-07-16/16:10:23 [4]
  -- 2020-08-22/18:40:54 [2]
  -- 2020-02-08/11:20:39 [6]
  -- 2020-08-20/12:01:43 [2]
  -- 2020-08-11/22:29:39 [7]
  -- 2020-08-13/18:20:38 [8]
  -- 2020-02-08/11:20:04 [3]
  -- 2020-08-19/09:26:19 [3]
  -- 2020-08-22/18:48:52 [1]
T0b14s Augspurger [71]
  -- 2020-07-23/15:59:06 [2]
  -- 2020-07-22/14:16:37 [1]
  -- 2020-02-08/11:23:48 [2]
  -- 2020-07-17/17:48:09 [1]
  -- 2020-02-08/11:20:39 [3]
  -- 2020-03-28/08:33:31 [9]
  -- 2020-07-23/14:14:00 [9]
  -- 2020-07-23/16:12:10 [3]
  -- 2020-07-18/09:12:25 [29]
  -- 2020-07-23/17:30:09 [1]
  -- 2020-03-08/18:35:34 [10]
  -- 2020-07-23/16:19:38 [1]
johannes karoff [1]
  -- 2020-07-31/16:55:59 [1]

.gitignore
Arne Döring [3]
  -- 2020-08-17/15:52:19 [3]
Nick Fiege [3]
  -- 2020-02-08/11:23:48 [3]
Tobias Augspurger [120]
  -- 2020-03-15/09:39:56 [3]
  -- 2020-02-08/11:20:39 [3]
  -- 2020-02-08/11:23:48 [1]
  -- 2020-02-29/09:45:42 [1]
  -- 2020-08-19/18:01:33 [3]
  -- 2020-02-10/22:39:29 [1]
  -- 2020-02-08/11:20:04 [108]
johannes karoff [1]
  -- 2020-07-22/16:24:14 [1]

Dockerfile
T0b14s Augspurger [3]
  -- 2020-02-28/22:24:07 [1]
  -- 2020-02-28/21:43:44 [2]
Tobias Augspurger [25]
  -- 2020-02-24/22:10:17 [4]
  -- 2020-02-28/23:47:20 [2]
  -- 2020-02-08/11:20:04 [14]
  -- 2020-02-08/11:20:39 [5]
kikass13 [20]
  -- 2020-08-06/22:38:34 [20]

Gemfile
Hendrik Radke [1]
  -- 2020-03-21/15:52:01 [1]
Tobias Augspurger [2]
  -- 2020-02-08/11:20:04 [1]
  -- 2020-02-24/22:10:17 [1]

LICENSE
Tobias Augspurger [661]
  -- 2020-02-08/11:20:04 [661]

README.md
Arne Döring [6]
  -- 2020-08-16/19:19:06 [6]
T0b14s Augspurger [12]
  -- 2020-03-22/08:40:18 [2]
  -- 2020-02-16/14:02:36 [1]
  -- 2020-07-03/00:02:46 [1]
  -- 2020-03-21/09:04:05 [1]
  -- 2020-02-08/11:20:39 [3]
  -- 2020-07-26/18:57:52 [2]
  -- 2020-02-24/20:33:17 [2]
Hendrik Radke [1]
  -- 2020-03-21/15:52:01 [1]
Felix Dietze [29]
  -- 2020-08-21/19:51:51 [29]
Tobias Augspurger [122]
  -- 2020-08-14/17:05:22 [2]
  -- 2020-08-17/08:38:30 [1]
  -- 2020-08-14/20:27:52 [3]
  -- 2020-08-22/08:50:51 [11]
  -- 2020-02-08/11:20:04 [3]
  -- 2020-08-14/15:01:23 [3]
  -- 2020-08-06/10:57:33 [1]
  -- 2020-08-22/18:25:36 [1]
  -- 2020-08-15/10:33:53 [15]
  -- 2020-08-16/12:32:28 [7]
  -- 2020-02-24/22:29:28 [1]
  -- 2020-02-08/11:23:48 [2]
  -- 2020-08-14/23:22:56 [1]
  -- 2020-08-16/11:23:35 [9]
  -- 2020-08-16/11:00:25 [9]
  -- 2020-08-13/14:59:30 [1]
  -- 2020-08-22/18:10:53 [3]
  -- 2020-08-19/17:46:07 [22]
  -- 2020-08-15/12:25:25 [4]
  -- 2020-08-21/12:31:53 [8]
  -- 2020-08-22/18:29:35 [1]
  -- 2020-03-14/13:28:49 [14]

build.sh
kikass13 [1]
  -- 2020-02-08/11:23:48 [1]
johannes karoff [1]
  -- 2020-07-31/16:54:50 [1]

docs/OpenSelery-04.png
Traceback (most recent call last):
  File "tests/pythonGitHelper.py", line 15, in <module>
    line = input()
  File "/usr/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 349: invalid start byte

docs/selery_workflow.png
Traceback (most recent call last):
  File "tests/pythonGitHelper.py", line 15, in <module>
    line = input()
  File "/usr/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 306: invalid start byte

openselery/__init__.py

openselery/coinbase_connector.py
Tobias Augspurger [23]
  -- 2020-02-24/22:10:17 [1]
  -- 2020-02-08/11:20:04 [8]
  -- 2020-08-04/10:32:22 [3]
  -- 2020-02-08/11:23:48 [5]
  -- 2020-03-14/22:46:41 [3]
  -- 2020-07-23/17:24:15 [3]
kikass13 [3]
  -- 2020-02-08/11:23:48 [3]
Johnny CrckMc [4]
  -- 2020-02-08/11:20:38 [4]
Arne Döring [15]
  -- 2020-02-08/11:23:48 [1]
  -- 2020-08-14/16:29:09 [14]

openselery/collection_utils.py
johannes karoff [6]
  -- 2020-07-23/18:02:33 [6]
Arne Döring [10]
  -- 2020-08-14/16:29:09 [10]

openselery/commandline.py
kikass13 [4]
  -- 2020-02-08/11:23:48 [1]
  -- 2020-08-06/22:49:10 [3]
Tobias Augspurger [9]
  -- 2020-08-12/00:14:44 [2]
  -- 2020-08-10/15:27:45 [1]
  -- 2020-02-24/22:10:17 [1]
  -- 2020-08-11/19:17:41 [1]
  -- 2020-08-10/13:34:45 [2]
  -- 2020-02-10/22:39:29 [2]
johannes karoff [37]
  -- 2020-07-23/18:02:33 [3]
  -- 2020-07-31/17:18:18 [8]
  -- 2020-07-31/16:48:13 [26]
Arne Döring [77]
  -- 2020-02-08/11:23:48 [9]
  -- 2020-08-14/16:29:09 [64]
  -- 2020-08-17/15:52:19 [4]

openselery/commit_identifier.py
johannes karoff [40]
  -- 2020-08-19/17:33:24 [40]

openselery/configuration.py
Nick Fiege [5]
  -- 2020-02-08/11:23:48 [5]
Arne Döring [7]
  -- 2020-08-14/16:29:09 [6]
  -- 2020-02-08/11:23:48 [1]
Tobias Augspurger [45]
  -- 2020-08-20/14:01:56 [45]
johannes karoff [24]
  -- 2020-08-19/17:33:24 [1]
  -- 2020-07-31/16:48:13 [19]
  -- 2020-07-31/17:18:18 [4]
kikass13 [47]
  -- 2020-08-17/22:20:38 [31]
  -- 2020-02-08/11:23:48 [7]
  -- 2020-08-18/18:06:09 [9]

openselery/git_utils.py
johannes karoff [11]
  -- 2020-08-19/17:33:24 [11]
Tobias Augspurger [39]
  -- 2020-03-15/11:48:26 [1]
  -- 2020-08-20/14:01:56 [8]
  -- 2020-02-08/11:23:48 [19]
  -- 2020-02-29/10:14:33 [11]
Arne Döring [4]
  -- 2020-08-14/16:29:09 [4]

openselery/github_connector.py
Tobias Augspurger [24]
  -- 2020-02-08/11:23:48 [7]
  -- 2020-03-08/20:29:37 [9]
  -- 2020-03-08/21:15:20 [1]
  -- 2020-08-05/13:37:11 [3]
  -- 2020-03-08/21:03:24 [3]
  -- 2020-02-08/11:20:04 [1]
johannes karoff [5]
  -- 2020-07-30/13:41:13 [5]
Arne Döring [29]
  -- 2020-02-08/11:23:48 [16]
  -- 2020-02-09/03:32:20 [9]
  -- 2020-08-14/16:29:09 [4]
kikass13 [27]
  -- 2020-02-08/11:23:48 [27]
Johnny CrckMc [1]
  -- 2020-02-08/11:20:38 [1]
Nick Fiege [12]
  -- 2020-02-08/11:23:48 [12]

openselery/librariesio_connector.py
kikass13 [47]
  -- 2020-02-08/11:23:48 [47]
Nick Fiege [5]
  -- 2020-02-08/11:23:48 [5]
Tobias Augspurger [4]
  -- 2020-02-08/11:20:04 [4]
Not Committed Yet [1]
  -- 2020-09-08/02:01:06 [1]
Arne Döring [40]
  -- 2020-02-08/11:23:48 [23]
  -- 2020-08-14/16:29:09 [17]

openselery/openselery.py
Nick Fiege [40]
  -- 2020-02-08/11:23:48 [40]
kikass13 [88]
  -- 2020-08-18/18:06:09 [4]
  -- 2020-08-06/22:49:10 [36]
  -- 2020-02-08/11:23:48 [48]
johannes karoff [26]
  -- 2020-07-31/15:31:50 [1]
  -- 2020-07-23/18:02:33 [2]
  -- 2020-07-30/13:41:13 [1]
  -- 2020-07-31/16:48:13 [5]
  -- 2020-08-19/17:33:24 [15]
  -- 2020-08-13/19:39:00 [2]
T0b14s Augspurger [3]
  -- 2020-02-24/20:33:17 [3]
Arne Döring [224]
  -- 2020-02-09/03:32:20 [1]
  -- 2020-08-17/16:02:01 [7]
  -- 2020-08-14/16:29:09 [190]
  -- 2020-02-08/11:23:48 [21]
  -- 2020-08-17/15:52:19 [5]
Tobias Augspurger [248]
  -- 2020-03-08/18:29:20 [2]
  -- 2020-07-23/15:45:31 [1]
  -- 2020-02-29/10:14:33 [1]
  -- 2020-03-15/09:39:56 [2]
  -- 2020-08-22/18:25:36 [21]
  -- 2020-08-10/00:52:14 [18]
  -- 2020-08-04/11:25:56 [1]
  -- 2020-08-15/08:54:03 [1]
  -- 2020-08-10/15:27:45 [3]
  -- 2020-08-19/17:46:07 [1]
  -- 2020-08-20/16:55:02 [1]
  -- 2020-08-10/09:42:57 [16]
  -- 2020-08-04/10:32:22 [11]
  -- 2020-02-10/22:39:29 [4]
  -- 2020-08-13/12:10:24 [7]
  -- 2020-02-16/19:58:40 [7]
  -- 2020-08-22/11:35:18 [10]
  -- 2020-08-20/14:01:56 [10]
  -- 2020-08-22/18:10:53 [1]
  -- 2020-03-15/20:28:36 [5]
  -- 2020-08-09/12:38:54 [8]
  -- 2020-02-08/11:23:48 [16]
  -- 2020-03-02/19:51:12 [1]
  -- 2020-08-12/15:41:36 [11]
  -- 2020-02-16/16:50:11 [3]
  -- 2020-07-17/15:08:11 [7]
  -- 2020-08-11/19:17:41 [5]
  -- 2020-08-09/12:39:21 [5]
  -- 2020-03-14/23:12:12 [4]
  -- 2020-08-17/14:13:58 [9]
  -- 2020-08-20/15:35:58 [9]
  -- 2020-02-25/23:47:31 [7]
  -- 2020-08-10/13:34:45 [10]
  -- 2020-08-09/23:36:16 [8]
  -- 2020-07-22/13:37:33 [2]
  -- 2020-07-23/17:24:15 [1]
  -- 2020-02-24/22:10:17 [4]
  -- 2020-03-20/21:00:45 [4]
  -- 2020-03-20/22:16:19 [1]
  -- 2020-08-10/16:33:05 [1]
  -- 2020-07-15/20:53:38 [1]
  -- 2020-08-15/09:39:46 [2]
  -- 2020-08-20/15:42:47 [1]
  -- 2020-03-15/11:48:26 [4]
  -- 2020-08-19/21:44:53 [1]
Not Committed Yet [10]
  -- 2020-09-08/02:01:06 [10]

openselery/os_utils.py
Arne Döring [8]
  -- 2020-08-14/16:29:09 [8]
kikass13 [13]
  -- 2020-08-06/22:40:44 [13]

openselery/ruby_extensions/scan.rb
Tobias Augspurger [16]
  -- 2020-02-08/11:20:04 [16]
Hendrik Radke [1]
  -- 2020-03-21/15:52:01 [1]

openselery/selery_utils.py
Arne Döring [18]
  -- 2020-02-08/11:23:48 [7]
  -- 2020-08-14/16:29:09 [11]
Felix Dietze [2]
  -- 2020-02-08/11:20:39 [2]
Nick Fiege [46]
  -- 2020-02-08/11:23:48 [46]
Tobias Augspurger [6]
  -- 2020-02-08/11:23:48 [1]
  -- 2020-02-08/11:20:04 [1]
  -- 2020-08-09/12:38:54 [4]
kikass13 [20]
  -- 2020-02-08/11:23:48 [20]

openselery/visualization.py
Arne Döring [112]
  -- 2020-08-14/16:29:09 [112]
Tobias Augspurger [2]
  -- 2020-08-05/13:37:11 [1]
  -- 2020-08-05/23:09:25 [1]
johannes karoff [74]
  -- 2020-07-23/18:02:33 [1]
  -- 2020-07-23/17:22:46 [19]
  -- 2020-08-13/19:39:00 [54]
kikass13 [22]
  -- 2020-08-06/22:43:58 [22]

run.sh
johannes karoff [1]
  -- 2020-07-31/16:54:50 [1]
Tobias Augspurger [12]
  -- 2020-02-08/11:20:04 [4]
  -- 2020-02-10/22:39:29 [1]
  -- 2020-03-15/20:28:36 [3]
  -- 2020-02-08/11:20:39 [1]
  -- 2020-08-22/18:10:53 [1]
  -- 2020-02-08/11:23:48 [1]
  -- 2020-03-20/22:16:19 [1]
kikass13 [22]
  -- 2020-08-06/22:38:34 [19]
  -- 2020-02-08/11:23:48 [3]

scripts/selery
Tobias Augspurger [1]
  -- 2020-02-08/11:20:04 [1]
johannes karoff [2]
  -- 2020-08-13/19:38:05 [1]
  -- 2020-07-31/16:48:13 [1]
Arne Döring [2]
  -- 2020-08-17/15:52:19 [1]
  -- 2020-02-08/11:23:48 [1]
Nick Fiege [1]
  -- 2020-02-08/11:23:48 [1]

selery.yml
Tobias Augspurger [37]
  -- 2020-08-15/08:54:03 [3]
  -- 2020-02-10/21:08:47 [11]
  -- 2020-08-22/11:35:18 [9]
  -- 2020-02-10/22:39:29 [1]
  -- 2020-08-22/11:46:21 [1]
  -- 2020-08-11/19:17:41 [5]
  -- 2020-03-08/18:29:20 [1]
  -- 2020-08-10/15:27:45 [2]
  -- 2020-08-13/12:18:48 [1]
  -- 2020-08-10/13:34:45 [3]
johannes karoff [11]
  -- 2020-07-31/15:31:50 [2]
  -- 2020-08-19/17:33:24 [9]
Not Committed Yet [1]
  -- 2020-09-08/02:01:06 [1]
Arne Döring [4]
  -- 2020-02-08/11:23:48 [3]
  -- 2020-08-16/19:19:06 [1]

setup.py
kikass13 [10]
  -- 2020-08-06/15:12:21 [9]
  -- 2020-08-06/22:36:30 [1]
Tobias Augspurger [7]
  -- 2020-08-10/22:49:50 [1]
  -- 2020-02-08/11:20:04 [5]
  -- 2020-08-17/09:14:32 [1]
Arne Döring [25]
  -- 2020-08-14/16:29:09 [25]

tests/just_clone.py
Arne Döring [14]
  -- 2020-08-14/16:29:09 [8]
  -- 2020-02-08/11:23:48 [6]
Tobias Augspurger [18]
  -- 2020-02-08/11:20:04 [18]

tests/random_bibliothecary.py
Arne Döring [46]
  -- 2020-08-14/16:29:09 [14]
  -- 2020-02-08/11:23:48 [32]
Tobias Augspurger [43]
  -- 2020-02-08/11:20:04 [43]

tests/random_clone_docker.sh
Tobias Augspurger [7]
  -- 2020-02-08/11:20:39 [3]
  -- 2020-02-08/11:20:04 [4]

for now git blame fu** up when dealing with binary files (which is not unheard of). And apparently while testing this I've had non-committed files inside my directory, meh :)

@Ly0n
Copy link
Owner

Ly0n commented Sep 8, 2020

Nice @kikass13 I think it is a really good approach. I went through the git blame and it is indeed a good indicator.

will replace/takeover the current gather/weight/split functionalities based on a dynamic / freely configurable framework

When you replace / takeover the existing architecture try to keep the existing functionality or even enhance it. The uniform weights and activity weights are quite important even if they are not that complex. I will today start to build some demo script to get into the coordination weights. I think the names file weights and coordination weights are quite good. @krux02 @cornerman @fdietze What is your opinion?

@Ly0n Ly0n added this to the v0.1.0 milestone Sep 8, 2020
@kikass13
Copy link
Contributor Author

kikass13 commented Sep 8, 2020

@Ly0n
well these are not "weights" per se .
These are just a classifier needed for someone to configure what he wants to express ...

to make my thought process clear:

  • contribution domains are groups that identify what a contributor has done and how important that was
  • contribution types are just an identifier (a filter of some sort) declaring what sort of actions will put contributors into that specific domain
    • Contributor1 does the following actions A,B,C,E,Z
    • Contributor2 does the following actions T,G,H,A
    • Domain1 is configured to be "triggered" for all contributors doing Action A
    • Domain2 is configured to be "triggered" for all contributors doing Action G and H
    • These Actions are considered a Contribution Type , for example everyone that touches a specific file or everyone that helps with a PR
    • Domain1 now holds Contributor1 & Contributor2
    • Domain2 now holds Contributor2
    • for each domain, there has to be a metric defining "what is fairness here?"
      • so Domain1 could be configured to be "equal" to each and everyone, so the weights of Contributor1 in relation to Contributor2 are 1/N=0.5 for every contributor [in this example N is 2]
      • so Domain1 could also be configured to "reward people with more lines of change" . This shows that ACTIONS (Contribution Types) and metrics kind of have a 1:1 connection here (semantically) because there are metrics which can only apply to a specific type and vice versa.

as you can see, I am a little confused about how metrics play their role here. I don't really know (right know I don't even have a slight clue) how we will configure, declare & apply metrics to a contribution domain. If someone has an idea, please give me some insight

kikass13 added a commit to kikass13/libreselery that referenced this issue Sep 10, 2020
…haves in a proper manner so that the engine can be tested; CDE now accepts optional and mandatory args for domains, actions etc; only email addresses will be used as keys (identifieing contributors) from now on, we could add algorithms for username<>email aggregation later; Ly0n#164
kikass13 added a commit to kikass13/libreselery that referenced this issue Sep 10, 2020
… now access the global libreselery configuration object via self.getGlobals(), which is used by the gitFile plugin to identify the directory to look for files; Added file filter into gitFile plugin so that only files matching the given patterns will be blamed for information Ly0n#164
@kikass13
Copy link
Contributor Author

kikass13 commented Sep 10, 2020

Disclaimer

To document what I did the last two days, here's a diagram depicting the data flow:
I will describe the image below ... just for curious people, flow starts at the top left side ;)

CDE

What happens:

  • input arg and config management via LibreSeleryConfig class
  • config object enables creation of LibreSelery class which will initialize itself properly, check things and prepare output dirs and online connection (from various sources)

... here is where the config fun starts ...

  • the config now contains more information regarding the new CDE (ContributionDistributionEngine)
    • a set of domains (ContributionDomain)
      • these are groups of tasks classified under a common purpose
      • for example: Code, Community, Documentation, Art, Bugfixes, Maintenance etc.
      • these labels are arbitrary and a project owner (PO) can configure the groups as he pleases
      • domains also have a weight, which identifies how important that specific domain is in relation to other domains
        • for example: CODE could sometimes be considered more important than DOCUMENTATION, controversial heresy ... i know ;)
    • each domain contains a set of actions (ContributionAction)
      • these are things, or rather tasks that users could potentially do to be considered working in this domain
      • users who have been identified to have successfully contributed to an action are considered contributors of that specific domain
      • each action has a type attribute, which identifies which plugin (ContributionActionPlugin) will be executed to do all the busy work
        • i will talk about plugins later, keep reading! 🥇
      • each action has a set of (WORK IN PROGRESS) elements, for example a filter which can be applied to specific actions to narrow down when the action applies to a potential contributor
        • for example: consider an action which measures the contribution of users to a file in the file system. Because there are CODE and DOCUMENTATION files, the action has to be configured in a way to only "trigger" for users working on CODE files (.py, .cpp, .h, etc.) instead of other files which shall not be measured inside this domain (we want to differentiate between CODE and DOCUMENTATION, remember?)
      • each action can also have multiple metrics attached
        • metrics define how contributions shall be scored, for example: how many score points shall a user with 100k "lines of code" receive for his lines-of-code-action? Shall the score diminish over time? Shall it be capped at a maximum to avoid abuse?
        • All these questions are a thing for the future because I have no idea -.-'
          - but hey, metrics could be cool right?

... what happens with all this? ...

  • not so fast ... we are nearly there. First of all LibreSelery has to initialize the CDE
    • it will automatically initialize all relevant classes mentioned earlier
      • a list of ContributionDomain
      • a list of ContributionAction objects for each domain
      • each ContributionAction will also initialize and load it's configured plugin
        • ContributionActionPlugin objects are essentially external python code files.
        • plugins are structured in a way that the CDE can work with them properly but their user code can be altered to fulfill all kinds of tasks.
        • the main task of plugins (for now) is to gather information about contributors regarding a specific action they could have done in the past (for example: adding lines of code to a specific file). They not also have to identify contributors by their tasks but must also score them for what they did.

... now the CDE is ready and can finally start working? ...

  • the following things will be called for information gathering
    • cde.gather()
    • for domain in domains: domain.gather()
    • for action in domain.actions: action.gather()
      • here the plugin gather function will be called, which is essentially usercode and should returna list of contributors and scores
  • now the contributor data of all actions inside a domain will be processed
    • cde.weight()
    • for domain in domains: domain.weight()
      • domain.weight() will also call domain.mangle(), which in turn will add up the contributor scores of all actions_
      • domain.weight() will just normalize the scores of all contributors from [0 ... to 1], these are not considered scores anymore, but are considered weights instead
  • lastly the weighted data of all domains will be condensed (merged) into a single value per contributor
    • cde.merge()
      • while merging, the domains weight parameter will also be applied, which in return will reduce the impact of specific domains in relation to others
  • the result is a list of contributors and their final weights

Done

any questions? No ? im going to bed now!

@yarikoptic
Copy link
Contributor

I am wondering about one additional aspect: historical perspective. Initial figure shows use of git log but I have not spotted it in further discussions. I think that may be useful to add "historical decay rule" (also configurable -- faster decay coefficient would accent on most recent states/contributions)

  • Given a state of the repository (current or at some past commit) -- estimate contributions split.
  • Estimate splits across history of the project, but probably should be done not on "for each commit" but at some regular time intervals (e.g. "last commit in the day") [1]. Store that information for future reuse
  • Combine estimated contributions historically: e.g. use some exponentially decaying (on time) function to add up all (or up to some X) previous states (for which splits are estimated) of the project.

Then that "combined" split is what would be used to decide on how/whom to split current funds allotment.

  • [1] sampling based on time is important to not over-contribute for "many small commits"

@kikass13
Copy link
Contributor Author

kikass13 commented Sep 16, 2020

general

@yarikoptic
We talked about a concept (which I did also mention in my examples somewhere) called "time degradation". I guess it's the same thing you mean. I like the idea of including time (absolute and differential) as a means to empower "fresher" contributions.

Me and other folks talked a little bit about it in here:
#132

Regarding the current development

The example i coded (for the CDE) which is currently free for review and further improvements (see my fork here: https://github.com/kikass13/libreselery/tree/cd_engine) includes a plugin based scoring system (small example) of git blame.

It gathers

  • who contributed how many lines to which file
  • as well as the timestamps of when the contribution happened

So with that plugin, it is technically possible to score newer contributions better than older ones. That's just an example though as the concept of "time" is a difficult one to configure properly.

post

In case you have any suggestions or want to help me putting a little example of what you said into code, I would be happy to get some help <3

@Ly0n
Copy link
Owner

Ly0n commented Sep 16, 2020

Meeting Note:
We should name the "Actions" the image "Activities" because "Actions" is already been used by Github Actions.

kikass13 added a commit to kikass13/libreselery that referenced this issue Sep 21, 2020
…connectors; added plugin which gets all remote contributors and scores them with a base uniform score (whatever selery did before in the main class); selery functionality is now restored up until the split() function and works as expected - CAREFULL, i did not add all the weight and split functionality back as as plugin, only some placeholder stuff; Contributrs are now a class used by all plugins, this makes their handling easier and uniform across the code; all actionns now have a generic <params> arg which is parsed from the selery.yml and can be used to express all sorts of stuff; restructured and repaired plugins a little bit; connectors can now be used in plugins, this functionality required some restruecturing and event handling with the CDE object from the LbreSelery main class using updateGlobals() functions Ly0n#164
@kikass13
Copy link
Contributor Author

Update from commit 31601a4:

I changed some of the internal stuff, bit the most important thing is that there is a plugin which does the same as the previous gather and weight() functions. It is not identical and a lot of stuff is missing. But the flow works well now and can be altered to fit whatever was before

Whats in there now:

  • plugins can use connectors now, which is essential for future improvements
  • plugins can now essentially do all the tasks which were in the main code before, but separated and clean
  • contributors are now 'boiled down' to a simple base class, which means that everyone can treat them equally (simple)
  • cde now does all the heavy lifting, while plugins do all the crazy stuff (user defined , arbitrary code)

Whats not in there:

  • you guys did a lot of merging random weights together, which I wont attempt to recreate right now
  • you guys had multiple dependency contributor things in there, which I will not recreate right now
    • this could be a nother plugin in the future, but that is not usefull until someone actually takes a look at this thing of a feature

@kikass13
Copy link
Contributor Author

kikass13 commented Oct 16, 2020

in case you want to look into it (@fdietze @cornerman) (my fork is here: https://github.com/kikass13/libreselery/tree/cd_engine)

kikass13 added a commit to kikass13/libreselery that referenced this issue Oct 21, 2020
kikass13 added a commit to kikass13/libreselery that referenced this issue Oct 21, 2020
…proper git dir; also fixed a bug there, where non committed authors would not be deleted properly Ly0n#164
kikass13 added a commit to kikass13/libreselery that referenced this issue Oct 26, 2020
…ath and .git dir Ly0n#164; added missing dep to setup file
kikass13 added a commit to kikass13/libreselery that referenced this issue Oct 26, 2020
…a plugin alias helper function which uses the filename as plugin name; removed unnecessary plugin test code Ly0n#164
kikass13 added a commit to kikass13/libreselery that referenced this issue Oct 26, 2020
…ing all steps into one single dict; helper function for splittgn dict into key-val-lists added for convenience; normalize_ step added after merge_ step in the engine, to be more transparent Ly0n#164
kikass13 added a commit to kikass13/libreselery that referenced this issue Oct 26, 2020
…; Configuration and conenctors given to plugins now exist with plugin initialization (and will probably not change, although they could); reworked parameter set for domains and activities to be of a consuming nature, which allows us to spot wrong user config params and report them (we do raise an exception); Removed unnecessary code; Cleanup preps for PR into main repo devel branch; Ly0n#164
kikass13 added a commit to kikass13/libreselery that referenced this issue Oct 26, 2020
@kikass13
Copy link
Contributor Author

kikass13 commented Oct 26, 2020

After the successful little meeting with @cornerman and @fdietze I changed some of the internal behavior and cleaned up the code. The main talking points were:

Domains

  • should the domain weights be configured as the sum of 1 or just a random factor (int or float?)
  • should plugins

Plugins

  • should plugins have some kind of parameter check when generic "params" are applied to them
  • plugins could give a reason for initialization error
  • plugins could have a dict of things it would need to run before running it
    • plugin-prerequisites
      • api version
      • specific connector needed to run (github, gitlab, whatever)

Contributor Data

  • should all the api / functions rely on a list of contributor objects and a seperate list of scores / weights?
    • or should these be unified to a list of contributors, where scores/weights are an attribute of the contributor class

Engine

  • actions should be renamed ("contributions" was an example)
    • I decided to take @Ly0n 's approach an call them "activity"/"activities", because that just sounds like my kind of cake
  • contributor data in gather>weight>split>merge pipeline should be immutable objects/dicts
    • I changed the code in a way, where each step takes the result of the previous one and returns it's own object
  • Change plugins to work without parallel/separate paths -> i.e. NO onGlobalsUpdate (no dynamic change of configs via event handling)
    • I had to disrupt the libreselery instantiation flow a little, but that was fine
    • the config of libreselery has to be valid at the point of plugin initialization!!!
      • The plugins will work with what they've got after init

It was decided that all bold formatted points are relevant prior to a first PR.
The last commits should address all of these "bold" points :)

Ly0n added a commit that referenced this issue Dec 5, 2020
* added cde first impl with configuration and a lot of magic ... stuff is commented out for testing so dont use this #164

* refined plugin based system; implemented first real plugin for local file contributions scanning; 'blacked' my code as far as possible; still work in progress #164

* git lines of code action plugin works now, its not finished but it behaves in a proper manner so that the engine can be tested; CDE now accepts optional and mandatory args for domains, actions etc; only email addresses will be used as keys (identifieing contributors) from now on, we could add algorithms for username<>email aggregation later;  #164

* Reworked gitFile plugin so that it does not suck anymore; Plugins can now access the global libreselery configuration object via self.getGlobals(), which is used by the gitFile plugin to identify the directory to look for files; Added file filter into gitFile plugin so that only files matching the given patterns will be blamed for information #164

* Updated plugins and their handling of globals(); plugins can now use connectors; added plugin which gets all remote contributors and scores them with a base uniform score (whatever selery did before in the main class); selery functionality is now restored up until the split() function and works as expected - CAREFULL, i did not add all the weight and split functionality back as as plugin, only some placeholder stuff; Contributrs are now a class used by all plugins, this makes their handling easier and uniform across the code; all actionns now have a generic <params> arg which is parsed from the selery.yml and can be used to express all sorts of stuff; restructured and repaired plugins a little bit; connectors can now be used in plugins, this functionality required some restruecturing and event handling with the CDE object from the LbreSelery main class using updateGlobals() functions #164

* added include_deps to remote action plugin #164

* fixed bug in test of file_contributions_plugin, it will now find its proper git dir; also fixed a bug there, where non committed authors would not be deleted properly #164

* small bugfix in file contributions plugin regarding current project path and .git dir #164; added missing dep to setup file

* renamed actions to activities; cleaned up plugins a little and added a plugin alias helper function which uses the filename as plugin name; removed unnecessary plugin test code #164

* contribution engine will now work with proper dicts instead of gathering all steps into one single dict; helper function for splittgn dict into key-val-lists added for convenience; normalize_ step added after merge_ step in the engine, to be more transparent #164

* Changed plugins to be stateless (at least for the relevant user code); Configuration and conenctors given to plugins now exist with plugin initialization (and will probably not change, although they could); reworked parameter set for domains and activities to be of a consuming nature, which allows us to spot wrong user config params and report them (we do raise an exception); Removed unnecessary code; Cleanup preps for PR into main repo devel branch; #164

* removed stale code and reapplied black #164

* Update README.md

* Fix some typos in README.md (#188)

* @Ly0n and @KikAss running LibreSelery on a complete awesome list

* improved to work with pip install and further awesome improvements

* added project url to coinbase message

* linter now works

* reviewed and tested changes to run LibreSelery in Docker with @kikass13

Co-authored-by: Tobias Augspurger <tobias.augspurger@protontypes.eu>
Co-authored-by: Tobias Augspurger <ly0@protonmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants