≡

wincent.dev

  • Products
  • Blog
  • Wiki
  • Issues
You are viewing an historical archive of past issues. Please report new issues to the appropriate project issue tracker on GitHub.
Home » Issues » Feature request #1135

Feature request #1135: Add repository browsing features

Kind feature request
Product wincent.dev
When Created 2008-09-11T15:17:28Z, updated 2010-09-19T09:14:02Z
Status closed
Reporter Greg Hurrell
Tags no tags

Description

This can then be hooked up with Git hooks to publish a "git log" like I had on the old site.

Comments

  1. Greg Hurrell 2009-05-04T16:07:32Z

    I think I probably want to go a fair bit further than just listing commits. As I make more and more of the code open source I'll want to move towards having a full repository browser integrated into the site. Don't want to reinvent the wheel, but it would probably be nice to have at least a skeletal browser in place; in order of interest are probably:

    • commit messages
    • commit diffs
    • full blobs
    • trees
    • branches
  2. Greg Hurrell 2009-11-23T13:33:43Z

    Summary changed:

    • From: Add "commit" model
    • To: Add repository browsing features
  3. Greg Hurrell 2009-11-23T13:52:22Z

    I've been thinking about this and adding the "git log" of old doesn't make much sense at all. It really has to be a full-blown repo browser.

    At the moment, the Git repos happen to be on the same machine as the webserver, but that is going to change with the move to AWS (see ticket #1440).

    Just about every repository browser out there needs to run on the same machine as the repo (think GitWeb itself, for example), so when we're on AWS I think I'll be adding a post-receive hook that will mirror the repos to the same machine as the webserver (basically like the existing hooks which push backups of public repos to GitHub and Gitorious).

    The question then is: how much to do in "real time" by shelling out the git command line tool and how much to store in the database?

    At one extreme you shell out for everything, and I don't think I'd do that, even with caching (because it would end up being a parallel caching system separate from the caching for the rest of the site).

    At the other extreme you store absolutely everything in the database, but I don't think I'd do that either because Git itself is a very efficient "database" of commits and blobs and such and replicating it in a relational database like MySQL would be a horrible, inefficient duplication.

    So what I thinking is that we could cache some stuff in the database at boot time (when the app boots), or when the admin forces a refresh, stuff like scanning the disk to see which repositories are present and what their "metadata" is (things like name, description, clone URL etc).

    For actually generating things like logs you would probably shell out to git.

    Once it's actually up and running we can look at performance and consider caching specific things in the database, like commit messages for specific commits and diffs and such. But really don't know what is going to be useful there until I've tried it.

    Will also have to look at what GitWeb does — that has good performance and as far as I know doesn't do much, if any, caching in the vanilla setup — as it is a good yardstick for what sort of performance you can expect, at least out of Perl, when shelling out to git.

    In terms of the models and URL design, the most interesting models/URLs will be:

    • repos:
      • /repos/: index of public repos, and for admin users private repos as well
      • /repos/example.git: overview page for specific repo, showing:
        • "metadata" for that repo: name, description etc
        • a (short) log of recent changes
        • branches (perhaps, or a link to a page showing them)
        • tags (perhaps, or a link to a page showing them)
    • branches: always nested within the context of parent repo
      • /repos/example.git/master or /repos/example.git/maint
      • possible alternative to avoid namespace clashes: /repos/example.git/branches/master
      • show list of commits on that branch (most likely short log view)
    • commits: again, nested within context of parent view
      • /repos/example.git/commits/{hash}
      • would show full log message and (configurable) diff as well
      • for example, it might be interesting to allow users to see log messages of closed-source repos, but not show the actual diffs
    • tags: again, nested:
      • /repos/example.git/tags/{tag}
      • would show tag annotation, along with the same stuff shown by the "commit" view

    For me that's the fundamentally interesting stuff. Viewing trees and blobs would be some icing on the cake:

    • trees:
      • /repos/example.git/trees/{hash}
    • blobs:
      • /repos/example.git/blobs/{hash}

    With all of these things (commits, trees, blobs) there would need to be as a basic security measure the restriction that the object in question should be reachable from one of the existing branch tips.

  4. Greg Hurrell 2009-11-23T14:00:40Z

    Obviously, if I want to make commits commentable, they will need to be cached in the database. (Probably lazily; if a particular comment is never viewed there is no need for it to be in the database.)

  5. Greg Hurrell 2009-11-23T17:15:42Z

    Comparison; GitHub URLs:

    • http://github.com/wincent/wikitext/commit/9f3c2e891a7321e6bf08d6c83c626aae3f3b2585 (show full log message, diff and comments form)
    • http://github.com/wincent/wikitext/tree/master/bin/ (tree referenced by HEAD of master)
    • http://github.com/wincent/wikitext/tree/b61035fd6fd10691fa8ce2b52f2c9ee4b4225ed0/bin (tree pointed to by other commit)
    • http://github.com/wincent/wikitext/commits/ (short log)
    • http://github.com/wincent/wikitext/commits/1.10 (viewing a tag, shows a short log corresponding to that tag)
    • http://github.com/wincent/wikitext/commits/maint (viewing a branch, shows a short log corresponding to that branch)
    • http://github.com/wincent/wikitext/blob/9f3c2e891a7321e6bf08d6c83c626aae3f3b2585/LICENSE.txt (showing a blob)
    • http://github.com/wincent/wikitext/blob/master/LICENSE.txt (showing a blob at HEAD of master branch)
  6. Greg Hurrell 2009-11-29T07:01:36Z

    Posted a blog post about this just now.

    As mentioned in the post, we want this to replace not only the old "Git Log" functionality but also the "Weekly progress reports" that I used to put up on the blog. So that means Atom feeds of commits. Probably:

    • /repos.atom: all commits in all repos
    • /repos/example.atom: all commits in a specific repo (could just be the commits reachable from HEAD, but would probably be more interesting if it were all commits reachable from all branches)
    • /repos/example/master.atom: all commits in a specific branch of a repo

    Two things to note:

    Firstly, URLs will look nicer if I exclude the .git from the repo component (ie. like GitHub and unlike GitWeb).

    Secondly, this kind of complicated feed, especially the "all repos" feed may require some sophisticated caching. Will probably have to fire off our cache sweepers whenever from the repo post-receive hooks. Actually merging the commits from all repos into a single feed may prove to be quite complicated; luckily the atom feed doesn't need to extend very far back into the history of each repository.

  7. Greg Hurrell 2009-12-04T15:24:11Z

    In GitWeb access control is all file-system based. That is, GitWeb is can be configured with GITWEB_LIST and GITWEB_STRICT_EXPORT to look at a certain path on the filesystem for repositories, and will only allow access to those. (The presence of a git-daemon-export-ok file in this case is irrelevant.)

    I think I want more control than that, at the application level. So I am thinking that security-wise, repos will only be shown when they are explicitly added, rather than pulling in all repos that happen to exist within a certain directory. One benefit of this is that I can reference repos in disparate places like /a/b/repo1.git and /b/c/repo2.git.

    In addition to the above access controls we will want application-level constraints about what people can see. Admins will be able to see all configured repositories. Other users will only be able to see open source ones (ones that have been designated as such at the application level and which have a git-daemon-export-ok file, perhaps).

    By setting access control at the application level we can have finer grained levels of control, such as:

    • Allowed to see all logs and content (commits, branches, trees, blobs etc)
    • Allowed to see logs but not content
    • Allowed to see oneline version of logs but not full version
    • Allowed to see existence of repository but nothing else about it (or perhaps just branch names and tag lists, for example)
  8. Greg Hurrell 2009-12-12T08:25:38Z

    Will have to show merge commits using git diff --cc.

  9. Greg Hurrell 2010-07-23T13:11:47Z

    Ok, this has been sitting around for long enough now. I think it's time to get started on implementing this:

    1.  get transport/mirroring working between Git server and web server; still need to decide on whether to use push or pull model, but leaning towards periodic pull (from cron job), although it would involve some lag
    2. start with "Repo" model resource because all the others are nested inside it, and it can actually be initially implemented at the application level with no actual Git access to the filesystem (because it's metadata only, at least at first)
    3. move on to "Commit" model, "Branch" model and "Tag" model (although we already have a model with that name, so will need to pick another); finally tackle the "Tree" and the "Blob" models
  10. Greg Hurrell 2010-07-26T13:37:22Z

    Ok, item "1" now done.

  11. Greg Hurrell 2010-07-28T01:48:59Z

    Sweet, just discovered git log -p --word-diff=porcelain.

  12. Greg Hurrell 2010-07-28T01:56:24Z

    Useful: git log --format=raw (plus the -p --word-diff=porcelain switches mentioned above, and -n 10 or -n 20 to limit the number of commits shown at a time).

  13. Greg Hurrell 2010-07-29T04:37:32Z

    Looks like I can set up a post-receive hook in the mirrored repositories to do cache invalidation. This will be particularly useful for things like Atom feeds which could get hit fairly often, and may be expensive to generate.

  14. Greg Hurrell 2010-09-19T09:13:54Z

    Ok, the basic functionality is now implemented. I'm going to mark this ticket as closed and open smaller, more focused tickets for the remaining details.

  15. Greg Hurrell 2010-09-19T09:14:02Z

    Status changed:

    • From: open
    • To: closed
Add a comment

Comments are now closed for this issue.

  • contact
  • legal

Menu

  • Blog
  • Wiki
  • Issues
  • Snippets