savannah-hackers-public
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Savannah-hackers-public] Running GNU Savannah (frontend) locally


From: Assaf Gordon
Subject: Re: [Savannah-hackers-public] Running GNU Savannah (frontend) locally
Date: Mon, 01 Sep 2014 20:26:06 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.0

On 09/01/2014 06:47 PM, Karl Berry wrote:
Assaf, so we can move beyond vague generalities of pros and cons, can
you please say something about the benefits you see in publishing
such a database?  I.e., what you hope the result will be?

I see the following advantages:

1. Hacking on the GNU Savannah code itself will be easier, and more inviting.

To fix bugs or to even replicate them, real data is needed.
The one-project example provided with the source code (in bootstrap.sql) is not 
sufficient for any real work.
To make it useful, real data is needed - bug reports, patches, support 
requests, bug histories with markdown, etc.

Not to mention that any complicated scenario (e.g. user in multiple projects, 
cross-referencing bugs, etc.) are all difficult for occasional contributors to 
simulate.


2. Bringing the current Savannah code up-to-date.
Even if there is a plan to move forward to a better platform, the current state 
of the Savannah server and code is a bit problematic (no offence to anyone, but 
that's my impression).
There are several features that "just work" on the server, but are probably 
impossible to replicate anywhere.
And based on recent discussions on the mailing list, it's not even clear if the 
Savanah git base is synchronized with what's running on the server.

I want to help improve the code, but I don't want to do my hacking on the 
production server as root.
I want to be able to replicate the environment on any development machine I 
have, and for there, I need (at least some) real data.

Being able to have a development machine will help us bring to code up to sync 
with the real server.
It will also (hopefully) help us migrate - because it will show us better 
what's needed from the production server and what's not.
This goes not only for the code, but also for the database.


3. Examining the current databases as preparation for migration to a new 
platform.
If/When GNU Savannah does migrate, how can we know all the current information 
was transferred correctly?
It be good if we could have a duplicated development machine, with all the 
data, and test different migration options on that machine.
Working on the production server is a "no-no".
Having a development copy of the database is better.
And so preparing such a database is already useful.
If we can make it public, we could have more interested hackers helping with 
the migration.
Or suggesting new features. Or testing the new platform.


4. Allowing interested people to explore the GNU Savannah public data (which is 
already public),
develop new useful features, and finding new interesting statistics.
More on that below.

For items #1 and #2 (and perhaps #3)- a small subset/snapshot of the database 
is sufficient.
But to be effective, it needs to contain enough real information, such as 
patches,tasks,bugs, multiple types of project configurations, of source code 
versionning programs, of mark-down examples, of long bug histories, etc.
Even such a small subset should be based on real data, and so it still requires 
exporting some data and making it public, and so discussing what's private and 
what's not is still needed.

Also, can you clarify whether you intend that the database would contain
the public personal information (i.e., anyone could do a query like
"show me everything public that karl has ever done"), or only aggregate
information ("how many commits to project xyz")?

First,
Let's agree on what is currently public and what is private on GNU Savannah.
I'm new to the GNU Savannah platform, so I could be missing a lot.
But I think most of the information is *already* public.

Examples, all as "not logged-in user":

List of up to 2000 users with "1" in their username or user account:
  
http://savannah.gnu.org/search/?type_of_search=people&words=1+1&offset=0&max_rows=2000
It'll simple enough to run ~100 queries like that and get all users.

I can similarly fetch the list of all projects:
 
http://savannah.gnu.org/search/?type_of_search=soft&words=1+1&offset=0&max_rows=2000

For each project, I can see the participating users:
 https://savannah.gnu.org/project/memberlist.php?group=gnulib

And all the tasks/bugs/patches/etcs with:
  
https://savannah.nongnu.org/support/index.php?group=gnewsense&status_id=0&chunksz=100

But I don't even need that, because one can simply iterate all 
bugs/tasks/patches, by sequential ID:
  https://savannah.gnu.org/support/index.php?100666

Then, looking at user "karl":
  http://savannah.gnu.org/users/karl/
will show which projects the user is a member of, when did he/she joined them,
the user name, the user number, and public keys.

When it comes to code, the repositories and their logs are already public,
so "show me everything public that ever mentioned Karl" with regard to code is 
mostly a matter of
   git log -i --grep karl
or the equivalent for bzr/hg/svn/etc.


All this information is already public.
Automating web-page parsing is a typical undergrad CS course, so it's not a 
technical problem doing it with a script.

If someone wanted this information, he/she already has it.

The information is not private.

For example, to answer a previous quandary: "which projects are active and which are 
stale",
I am now working on a small script which automatically fetches the logs from 
all the project repositories in Savannah.
It's all public and it's very doable.


Now,
It's easier to explain what I want to make public:
Everything that an unlogged user on GNU Savannah can already see.


I agree that there is a technical differences between making an interested user 
jump through web-parsing hoops,
and between providing an SQL-based database which allows simple queries.
But there is no conceptual difference.
The information is already out there.


Regards,
 - Assaf




reply via email to

[Prev in Thread] Current Thread [Next in Thread]