tag:blogger.com,1999:blog-43450663026524245382024-03-21T03:24:02.962+01:00@datapythonista@datapythonistaMarchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.comBlogger100125tag:blogger.com,1999:blog-4345066302652424538.post-12206163211468886102018-03-22T01:57:00.000+01:002018-03-22T14:10:17.470+01:00#pandasSprint write-up<style type="text/css">
@page { margin: 2cm }
p { margin-bottom: 0.25cm; line-height: 120% }
a:link { so-language: zxx }
</style>
<br />
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;">The
past 10th of March took place <a href="https://python-sprints.github.io/pandas/">#pandasSprint</a>. To the best of my
knowledge, an unprecedented kind of event, where around 500 people
worked together in improving the documentation of the popular pandas
library.<br />
</span></span></span></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;">As
one of the people involved in the organization of the event, I wanted
to write about why I think this event was much more than the
contributions sent, and the fun day we had. And also provide
information on how it was planned, to help future organizers.</span></span></span></div>
<br />
<h2>
Some historical context</h2>
<br />
<span style="color: #222222; font-family: "arial" , sans-serif;">To
explain where the idea of the #pandasSprint came from, I need to go
back in time more than 15 years. Those were the times where open
source was named free software, people queued to see </span><a href="https://en.wikipedia.org/wiki/Richard_Stallman" style="font-family: arial, sans-serif;">Richard Stallman</a><span style="color: #222222; font-family: "arial" , sans-serif;">
talks, and companies like SCO and Microsoft were in the dark side of
proprietary software. Free software was more about freedom than about
software, and the free software community was working hard and united
to build the software that could challenge the status quo.</span><br />
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><br /></span></span></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://trisquel.info/files/richard%20stallman.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="450" data-original-width="800" height="179" src="https://trisquel.info/files/richard%20stallman.jpg" width="320" /></a></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><br /></span></span></span></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;">Now
we’re in 2018, and things changed a lot. SCO doesn’t exist
anymore, and Microsoft is one of the companies supporting more open
source. Employing more Python core developers than any other company, sponsoring major events like PyCon or EuroPython, and funding
non-profits like <a href="https://www.numfocus.org/">NumFOCUS</a>, <a href="https://www.python.org/psf/">The Python Software Foundation</a> and even
<a href="http://www.linuxfoundation.org/">The Linux Foundation</a>. Python is growing in popularity, and nobody
questions the advantages of open source software.</span></span></span></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;">But
what happened to all the free software hackers who untiringly were
making their projects be to the highest standards? Of course there
are still many people there, but my perception is that the growth in
popularity of open source projects didn’t translate linearly to a growth in
the number of contributors. And I think pandas is one of the clearest
examples.</span></span></span></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;">For
the last years, pandas has been becoming a de-facto standard in data
analytics and data science. Recently, Stack Overflow published that <a href="https://stackoverflow.blog/2017/09/14/python-growing-quickly/">almost 1% of their traffic from developed countries is caused by pandas</a>.
The book Python for data analysis by pandas creator <a href="https://twitter.com/wesmckinn/status/974303935530876928">sold more more250,000 copies</a>, and the pandas website has around <a href="https://twitter.com/jorisvdbossche/status/974322924034449408">400,000 activeusers per month</a>. It’s difficult to know how many pandas users
exist, but some <a href="https://twitter.com/teoliphant/status/974056911627866113">informed opinions talk about 5 million</a>.</span></span></span></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://stackoverflow.blog/wp-content/uploads/2017/09/related_tags_over_time-1-1200x1200.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="800" data-original-width="800" height="320" src="https://stackoverflow.blog/wp-content/uploads/2017/09/related_tags_over_time-1-1200x1200.png" width="320" /></a></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;">What
about the contributors? In a quick look at <a href="https://github.com/pandas-dev/pandas/graphs/contributors">GitHub</a>, I counted 12
developers that have been active in the last year, and that
contributed more than 20 commits to the project. This leaves a ratio
of 1 significant contributor for more than 400,000 users. Not long
before the #pandasSprint the project achieved 1,000 contributors.
Meaning that 1 in each 5,000 ever made a contribution.</span></span></span></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;">You
can find these small or big depending on your expectations. And it’s
difficult to compare without numbers about Python projects 10 years ago.
But my feeling is that we transitioned from a free software community
of developers actively participating in the projects, to a community
of mainly users, who in many cases see free software as <a href="https://en.wikipedia.org/wiki/Free_as_in_Freedom">free beer</a>.</span></span></span></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<h2>
How to become part of the open source community</h2>
<div>
<br /></div>
<div style="line-height: 100%; margin-bottom: 0cm;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;">I
don’t know why people become part of the open source community, in
terms of participating actively on it. But I know how I did. It’s a
beautiful and sad story that I want to share.</span></span></span></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;">Around
12 years ago, I was quite new to Python, but really liking the
language compared to what I used before. Most of what I was doing was
web based, so I quickly discovered Django, and felt in love. What in
PHP (the de-facto standard at that time) took one week or more to
implement, in Django was done in minutes, and with much higher
quality. Django was simply amazing, the web framework for
perfectionists with deadlines. But in some areas not as mature as it
is now. And I’m talking mainly about localization. The system to
translate static text was amazing, but you couldn't make calendars
start in Monday, or use the comma as a decimal separator. That was a
big problem for me, as my users in Spain wouldn't be happy using the
US localization. The good news was that it was open source, so I
started to take a look on what could be done.</span></span></span></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;">When
I submitted my first bug reports and patches to Django, I found the
best mentor a newcomer to open source can find, Malcolm Treddinick.
He was the core developer more involved in the localization part of
Django. Malcolm helped me in every step, and I learned a lot from him
about Python, Django, subversion... But I also learned from him (and
also from others in the community) about kindness and collaboration.
It was a really welcoming community, and honestly, at the beginning I
found it quite surprising the amount of time people was happy to
spend helping and giving support to someone who didn’t have so much
to contribute. After some time, I managed to be more experienced, and
I was able to contribute back, taking care of the Catalan and Spanish
translations for some years, and doing a major <a href="https://datapythonista.blogspot.co.uk/2009/12/new-localization-system-already-in.html">refactoring of Django's localization system</a>, as part of a <a href="https://summerofcode.withgoogle.com/">Google summer of code</a>. But
who could know that beforehand.</span></span></span></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://puzzling.org/wp-content/uploads/2013/03/2834869959_85974cbd42_b-248x300.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="300" data-original-width="248" src="https://puzzling.org/wp-content/uploads/2013/03/2834869959_85974cbd42_b-248x300.jpg" /></a></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;">I
was in shock when in 2013 <a href="https://www.djangoproject.com/weblog/2013/mar/19/goodbye-malcolm/">Malcolm passed away</a>. Besides being a
tragedy for him and his close ones, it was also for many of us, who
barely met him in person, but considered him a friend. The Django
Software Foundation created the <a href="https://www.djangoproject.com/foundation/prizes/">Malcolm Tredinnick Memorial Prize</a> in
his honor. The prize is awarded, quoting the DSF page “to the
person who best exemplifies the spirit of Malcolm’s work - someone
who welcomes, supports and nurtures newcomers; freely gives feedback
and assistance to others, and helps to grow the community”.</span></span></span></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;">Malcolm
was unique, but the open source community is the amazing community it
is, because there are so many amazing people who exemplifies the
spirit of Malcolm every day.</span></span></span></div>
<h2>
<br /> London Python Sprints</h2>
<div>
<br /></div>
<div style="line-height: 100%; margin-bottom: 0cm;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">So,
with such an amazing community (and I experienced it enough to be
sure about it), what is it preventing more people to get involved? I
would say most people thinks that technically speaking, they are not
good enough for the projects. That you need the mind of <a href="https://en.wikipedia.org/wiki/Alan_Turing">Alan Turing</a>,
<a href="https://en.wikipedia.org/wiki/Dennis_Ritchie">Dennis Ritchie</a> or <a href="https://en.wikipedia.org/wiki/Linus_Torvalds">Linus Torvalds</a> to make a contribution. I strongly
disagree. Even the less technical people can participate in many
things such as translations, writing documentation, ticket triaging…
There are also many great projects in their early stages were
contributing code is much easier than contributing to the more
complex and intimidating ones.</span></span></span></span></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">Then,
</span><span style="font-weight: normal;">what’s the problem?
Personally, I think the only problem is getting started. The first
time, it’s difficult to find a task to get started. It’s
difficult to understand the <a href="https://docs.google.com/presentation/d/1rOSYXZPyMe9KXnbVK_xbJzw_-ijxd6bIxndmvPU6L2o/edit?usp=sharing">logistics of sending a pull request</a>. It’s
difficult to know beforehand whether project maintainers will welcome
our small contributions. And it may be difficult to even know that we
need a task to work in, that we need to send a pull request, or that
there is a community out there working on every project. But these
are just difficult until someone is able to help you get started.</span></span></span></span></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://secure.meetupstatic.com/photos/event/5/e/a/f/highres_465084239.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="800" height="240" src="https://secure.meetupstatic.com/photos/event/5/e/a/f/highres_465084239.jpeg" width="320" /></a></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">With
this idea in mind, <a href="https://www.meetup.com/Python-Sprints/">London Python Sprints</a> was born. A place where open
source contributors could mentor newcomers in their first steps. And
personally, I think it’s very successful. Not only we managed to
send around 50 pull requests to different projects in 2017, but
people who did the first pull request with us, are now the mentors
helping others get started.</span></span></span></span></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<h2>
#pandasSprint: the idea</h2>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://secure.meetupstatic.com/photos/event/6/2/2/1/highres_468505121.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="358" data-original-width="800" height="143" src="https://secure.meetupstatic.com/photos/event/6/2/2/1/highres_468505121.jpeg" width="320" /></a></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">While
the experience in London was great, it was very low scale. And we
could do much better. All it takes for many people to love becoming a
contributor, is to have some guidance in these first steps. We
already had the experience from several months of sprints in London,
and with some preparation we could help other user groups do the
same.</span></span></span></span></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">Why
pandas? </span><span style="font-weight: normal;">There are plenty of
great projects to contribute to. But for pandas… </span><span style="font-weight: normal;">Everybody
loves pandas, </span><span style="font-weight: normal;">it’s very
popular.</span><span style="font-weight: normal;"> I</span><span style="font-weight: normal;">t’s
a welcoming project in the spirit of Malcolm. Improving the
documentation </span><span style="font-weight: normal;">would be
something very useful. And it’s one of the projects I’m more
familiar with</span><span style="font-weight: normal;">.</span></span></span></span></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">But
it’s probably clear that the goal wasn’t that much about the
specific project or contributions. But about letting people get into the open
source world in the way many of us love it. Becoming part of it, and
not just being a user of some software we don’t need to pay for.</span></span></span></span></div>
<h2>
<br /> #pandasSprint: the implementation</h2>
<div>
<br /></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">So,
we wanted to have a huge open source party, but of course that
required a huge amount of work.</span></span></span></span></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">The
first thing was to make sure the pandas core developers were happy
with it. It was going to be a lot of work from their side, and they
know much more about pandas than anyone else, and could tell whether
it was a good idea, or provide useful feedback. An email to <a href="https://twitter.com/jreback">Jeff Reback</a> was enough to start. He loved the idea, even if I
think he didn’t believed at that time it was going to be something
as big as it finally was. :)</span></span></span></span></div>
<h3>
</h3>
<h3>
</h3>
<h3>
</h3>
<h3>
Dividing the work</h3>
<div style="line-height: 100%; margin-bottom: 0cm;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;"><br /></span></span></span></span>
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">The
next thing was to make sure everybody had something to work on the
day of the sprint. Working on the documentation made it possible.
There are around <a href="https://docs.google.com/spreadsheets/d/10EpQFkVDqiIFLLVGtIWzCMRACz20yWuta3_DU0qV6-E/edit?usp=sharing">1,200 API pages</a> in the pandas documentation. Writing
a script to get the list was easy. We could even gather some
information on the state of the documentation (which pages had
examples, which methods had mistakes in their documented
parameters…).</span></span></span></span></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">The
trickiest part was the system to share docstrings in pandas. There
are many functions and methods in pandas, that are similar enough to
have a shared template for the documentation, customized with few
variables specific to each page. The original idea was to use Python
introspection system to find the exact ones sharing a template, so we
could avoid duplicates. That was more complex than it originally
seemed, and we finally delegated the task of finding out to each user
group. </span><span style="font-weight: normal;">To help with that, we
divided the pages in groups by topics, and assigned whole groups to
each sprint chapter. Sharing of docstrings was more likely to happen
inside these groups. For example, all the functions in Series.str
where in a group. Functions like <a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.lower.html">lower()</a>, <a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.upper.html">upper()</a>, <a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.capitalize.html">capitalize()</a> use
the same template, so it should be somehow easy to detect it in the
chapter working on that group.</span></span></span></span></div>
<h3>
</h3>
<h3>
</h3>
<h3>
</h3>
<h3>
Documentation</h3>
<div style="line-height: 100%; margin-bottom: 0cm;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;"><br /></span></span></span></span>
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">Then,
after being able to provide each participant a task, we had to make
sure everybody knew what to do. For it, there were two main things.
First, having documentation explaining all the steps. And second
having mentors in every city.</span></span></span></span></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">For
the documentation, we had 3 main documents:</span></span></span></span></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">-
<a href="https://python-sprints.github.io/pandas/guide/pandas_setup.html">Set up instructions</a> (installing requirements, cloning the repository,
compiling C extensions…)</span></span></span></span></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">-
<a href="http://pandas-docs.github.io/pandas-docs-travis/contributing_docstring.html">Guide</a> on how to write a docstring for pandas</span></span></span></span></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">-
<a href="https://python-sprints.github.io/pandas/guide/pandas_pr.html">Instructions</a> on how to validate the changes, and submit them</span></span></span></span></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">The
most complex part was defining how a “perfect” docstring had to
look like. Following some standards would be very useful for pandas
users. All the pages would be implemented in the best possible way we
could think of. And users would be able to get used to one format,
and find information faster.</span></span></span></span></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">We
started with a draft of a guide in the form of <a href="https://github.com/pandas-dev/pandas/pull/19704/files">pull request</a>, so
everybody could review and add comments. And then it was a bit of
discussion on the topics with disagreements or unclear. I think the
result was great. But of course we couldn’t anticipate all the
cases.</span></span></span></span></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="line-height: 100%; margin-bottom: 0cm;">
We also had to write <a href="https://github.com/pandas-dev/pandas/pull/20016/files">documentation</a> about shared docstrings, and what was the preferred way to implement it. <a href="https://twitter.com/TomAugspurger">Tom Augspurger</a> took care of it.<br />
<h3>
</h3>
<h3>
Mentoring</h3>
</div>
<div style="line-height: 100%; margin-bottom: 0cm;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;"><br /></span></span></span></span>
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">A
key thing was to make sure in every location we had people who could
mentor participants. We created a <a href="https://gitter.im/py-sprints/pandas-doc">gitter channel</a> for the event, but
it would be difficult to remotely help in more than specific things.
Everybody was in their own local sprint, and we also had different
time zones, so availability during the sprint would be limited.</span></span></span></span></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<br /></div>
<div style="font-style: normal; font-variant: normal; letter-spacing: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
<span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="font-weight: normal;">So,
what we did was to ask <a href="https://docs.google.com/spreadsheets/d/138095mUxOTOCCXmvQGz7YOh-0yWLoTH_8_IlrAI5w2c/edit?usp=sharing">somebody from each chapter to work on a taskbefore the sprint</a>. In most cases that was the same organizers. I don't know if that is true, but I had the feeling that some organizers were underestimating how complex improving
a single API documentation page is. And how difficult is to help a large group of people who is doing their first open source contribution can be. Letting them prepare before hand should be useful in different ways:</span></span></span></span></div>
<div style="font-variant-east-asian: normal; font-variant-numeric: normal; line-height: 100%; margin-bottom: 0cm; orphans: 2; widows: 2;">
</div>
<ul>
<li><span style="color: #222222; font-family: "arial" , sans-serif;">Organizers would be better prepared, and have a better sprint, without so much stress and uncertainty.</span></li>
<li><span style="color: #222222; font-family: "arial" , sans-serif;">They should be able to help participants better.</span></li>
<li><span style="color: #222222; font-family: "arial" , sans-serif;">The "mini" sprint of the organizers would be a proof of concept that would let us anticipate problems in the documentation, the procedure...</span></li>
</ul>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;">Not all the organizers found the time to prepare, as we were ready to start this stage less than a week before the global sprint date. But I think it was very useful for the ones who could prepare for the sprint.</span></div>
<h3>
</h3>
<h3>
Tools</h3>
<div style="line-height: 100%; margin-bottom: 0cm;">
<span style="font-variant: normal;"><span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="letter-spacing: normal;"><span style="font-style: normal;"><span style="font-weight: normal;"><br /></span></span></span></span></span></span></span>
<span style="font-variant: normal;"><span style="color: #222222;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="letter-spacing: normal;"><span style="font-style: normal;"><span style="font-weight: normal;">One
of the areas we worked on preparing the sprint, was in having better
tools. <a href="https://twitter.com/jorisvdbossche">Joris Van den Bossche</a>, besides being key in all the parts of the sprint, did an amazing job on this part.</span></span></span></span></span></span></span><span style="font-variant: normal;"><span style="color: #222222;"><span style="text-decoration: none;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="letter-spacing: normal;"><span style="font-style: normal;"><span style="font-weight: normal;"> </span></span></span></span></span></span></span></span><span style="font-variant: normal;"><span style="color: #222222;"><span style="text-decoration: none;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="letter-spacing: normal;"><span style="font-style: normal;"><span style="font-weight: normal;">We
implemented a way to <a href="https://github.com/pandas-dev/pandas/pull/19840/files">build a single document in Sphinx</a>, and a <a href="https://github.com/pandas-dev/pandas/blob/master/scripts/validate_docstrings.py">script to validated formatting errors in docstrings</a>. We also set up a
<a href="https://github.com/pandas-dev/pandas/pull/20015/files">sphinx plugin to easily include plots in the documentation</a>, which <a href="http://pandas-docs.github.io/pandas-docs-travis/generated/pandas.DataFrame.plot.kde.html">made some pages look really great</a>.</span></span></span></span></span></span></span></span></div>
<div style="line-height: 100%; margin-bottom: 0cm;">
<br /></div>
<div style="line-height: 100%; margin-bottom: 0cm;">
<span style="font-variant: normal;"><span style="color: #222222;"><span style="text-decoration: none;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="letter-spacing: normal;"><span style="font-style: normal;"><span style="font-weight: normal;">Last
minute, we also build a <a href="https://python-sprints.github.io/pandas/dashboard.html">dash</a></span></span></span></span></span></span></span></span><span style="font-variant: normal;"><span style="color: #222222;"><span style="text-decoration: none;"><span style="font-family: "arial" , sans-serif;"><span style="font-size: small;"><span style="letter-spacing: normal;"><span style="font-style: normal;"><span style="font-weight: normal;"><a href="https://python-sprints.github.io/pandas/dashboard.html">board</a>
with a list of checkpoints that the users could follow during the
day, so it was clearer to know what to do, and it should help them make
better contributions.</span></span></span></span></span></span></span></span></div>
<div style="line-height: 100%; margin-bottom: 0cm;">
<h3>
</h3>
<h3>
Promotion</h3>
</div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif; font-size: 16px;"><br /></span>
<span style="color: #222222; font-family: "arial" , sans-serif; font-size: 16px;">Promoting the event, and finding the people willing to participate was done in different ways:</span></div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;"><br /></span></div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;">The first one was to direct message the organizers of different communities. Among all the great things of the Python community, is how well organized it is. In a <a href="https://www.meetup.com/pro/pydata">single page</a> there are the links to the almost 100 PyData meetups all around the world. In the Python website there is a <a href="https://wiki.python.org/moin/LocalUserGroups">wiki</a> with tens of Python user groups. Not everybody we contacted was interested, or even answered, but most of the groups were really happy with the idea.</span></div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;"><br /></span></div>
<div>
<a href="https://www.python.org/psf/" style="font-family: arial, sans-serif; font-size: 16px;">The Python Software Foundation</a> and<span style="color: #222222; font-family: "arial" , sans-serif; font-size: 16px;"> </span><a href="https://www.numfocus.org/" style="font-family: arial, sans-serif; font-size: 16px;">NumFOCUS</a><span style="color: #222222; font-family: "arial" , sans-serif;"> were also key in spreading the word about the event.</span></div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;"><br /></span></div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;">As the sprint was to work on the documentation, we also contacted <a href="http://www.writethedocs.org/">Write the docs</a>, a global community focused on writing technical documentation. Some of their members joined the sprint too.</span><br />
<span style="color: #222222; font-family: "arial" , sans-serif;"><br /></span>
<br />
<h2>
<span style="color: #222222; font-family: "arial" , sans-serif;">The sprint</span></h2>
</div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;"><br /></span></div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;">For the day of the sprint, we've got a last minute surprise. I really think what every participant of the sprint was going to do, was something really great. Even if in a way it felt more like a Saturday with friends. And I think it was worth that people knew how important is to contribute to the open source projects that power from the scientific research to the financial markets, or the data science infrastructure of so many companies in the world. So, just few hours after the sprint we spoke with <a href="https://twitter.com/wesmckinn">Wes McKinney</a>, creator of pandas, <a href="https://twitter.com/NaomiCeder">Naomi Ceder</a>, chair of the Python Software Foundation, and Leah Silen, executive director at <a href="https://twitter.com/NumFOCUS">NumFOCUS</a>, to see if they could record a short message to the participants. Even with the very short notice, all them sent really great messages that we could show the participants at the beginning of the sprints.</span></div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/YnFKV2oxs8Q/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/YnFKV2oxs8Q?feature=player_embedded" width="320"></iframe></div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;"><br /></span></div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;"><br /></span></div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;">It's difficult to know what happened in the sprint at a global scale. I think in London we've got a great time, with nice atmosphere and a luxury location provided by our sponsor <a href="https://twitter.com/TechAtBloomberg">Bloomberg</a>. I think for most of us the sprint seemed too short. Even if I think it was a typical British pub follow up to the sprint, that I couldn't join.</span></div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://secure.meetupstatic.com/photos/event/a/5/1/5/highres_469122261.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="800" height="240" src="https://secure.meetupstatic.com/photos/event/a/5/1/5/highres_469122261.jpeg" width="320" /></a></div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;"><br /></span></div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;"><br /></span></div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;">In other locations, for what I know the experience was also good. It's worth taking a look at the <a href="https://twitter.com/hashtag/pandasSprint">twitter feed of the sprint</a>.</span></div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjO3fRufS5c9gEAMdxb46TsDGijGUE1YX0VZVRtxLszXPsj83L_5z1xZV-TOkSuCYqBa7O8onXvqYAALFRUCSOitEJo9TsNCQSKvYkTF-EOJ77gbT9owuL_gLi5Um2cfZuEbND2Z1kXQGXR/s1600/DX9KVEaX0AAhZ0N.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="357" data-original-width="502" height="227" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjO3fRufS5c9gEAMdxb46TsDGijGUE1YX0VZVRtxLszXPsj83L_5z1xZV-TOkSuCYqBa7O8onXvqYAALFRUCSOitEJo9TsNCQSKvYkTF-EOJ77gbT9owuL_gLi5Um2cfZuEbND2Z1kXQGXR/s320/DX9KVEaX0AAhZ0N.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUEGpb9WC4GSjRiyJT4z5Q1p530nXJ399InB4YIocdcXc8YNIewYWeFHZ082Z86mbl_QJ_Fy26gfagvwZZxpndr0g6LQUBCMINKA3KzfyrdDNsBgQO4tFKnFOsrAFO-4xgwdkeyoIie3D6/s1600/DX9KVEXX4AIpNYv.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1024" data-original-width="768" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUEGpb9WC4GSjRiyJT4z5Q1p530nXJ399InB4YIocdcXc8YNIewYWeFHZ082Z86mbl_QJ_Fy26gfagvwZZxpndr0g6LQUBCMINKA3KzfyrdDNsBgQO4tFKnFOsrAFO-4xgwdkeyoIie3D6/s320/DX9KVEXX4AIpNYv.jpg" width="240" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiLMn4SK3E59fXUIgmMhUngz89aYN0c6pyV9babN_kbxD2YPbgS9YifvW9qyoC2zw5aWzRtcnQDcKaDm7aq2ozPpA74pbMhh13eYRd7D_YQ8e1b0rkSoIxCxXUcw-ETqb0G8_7uBNcWmsse/s1600/DX9KVEWWkAQR4hg.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="900" data-original-width="1200" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiLMn4SK3E59fXUIgmMhUngz89aYN0c6pyV9babN_kbxD2YPbgS9YifvW9qyoC2zw5aWzRtcnQDcKaDm7aq2ozPpA74pbMhh13eYRd7D_YQ8e1b0rkSoIxCxXUcw-ETqb0G8_7uBNcWmsse/s320/DX9KVEWWkAQR4hg.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQz1x1_hcEGqp4EcLoCPw0e97Cqm5l34uoxyxmByBOobM597N8SNkismM3tXtGIW59XMLoQOQpmw_zm1ESimIZsb2G1zrLH8eGD_RLXXgaPtyo5OufhnzOK91jOClX3jc-F817v4kFbY2t/s1600/DX9KVEXWsAImaHY.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="900" data-original-width="1200" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQz1x1_hcEGqp4EcLoCPw0e97Cqm5l34uoxyxmByBOobM597N8SNkismM3tXtGIW59XMLoQOQpmw_zm1ESimIZsb2G1zrLH8eGD_RLXXgaPtyo5OufhnzOK91jOClX3jc-F817v4kFbY2t/s320/DX9KVEXWsAImaHY.jpg" width="320" /></a></div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;"><br /></span></div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;"><br /></span></div>
<div>
<span style="color: #222222; font-family: "arial" , sans-serif;">Also, I really enjoyed reading the write-ups that some organizers and participats wrote:</span></div>
<div>
<ul>
<li>From Iva and Tsvetelina, organizers in Sofia: <a data-saferedirecturl="https://www.google.com/url?hl=en&q=https://www.facebook.com/evolutiontc/posts/2040798282603060&source=gmail&ust=1521761024772000&usg=AFQjCNHU4VmOnoeKz1oapZgDDQjo3hUZaQ" href="https://www.facebook.com/evolutiontc/posts/2040798282603060" style="background-color: white; color: #1155cc; font-family: arial, sans-serif; font-size: 12.8px;" target="_blank">https://www.facebook.<wbr></wbr>com/evolutiontc/posts/<wbr></wbr>2040798282603060</a></li>
<li>From Priyanka, a participant in Amsterdam: <a data-saferedirecturl="https://www.google.com/url?hl=en&q=https://www.linkedin.com/pulse/pandassprint-amsterdam-my-experiences-priyanka-ojha/&source=gmail&ust=1521676822822000&usg=AFQjCNEzI1IaAheFGl2TAyuKPIyrgaVemA" href="https://www.linkedin.com/pulse/pandassprint-amsterdam-my-experiences-priyanka-ojha/" style="color: #1155cc; font-family: arial, sans-serif; font-size: small;" target="_blank">https://www.linkedin.com/<wbr></wbr>pulse/pandassprint-amsterdam-<wbr></wbr>my-experiences-priyanka-ojha/</a></li>
<li>From <a href="https://twitter.com/IHackPY">Himanshu</a>, organiser in <a href="https://twitter.com/PythonKanpur">Kanpur</a>, India: <a data-saferedirecturl="https://www.google.com/url?hl=en&q=https://kanpurpython.wordpress.com/2018/03/15/experience-of-pandas-documentation-sprint/&source=gmail&ust=1521676822822000&usg=AFQjCNH6ur9Mf7G98G-bfJFGPJN_WE9ung" href="https://kanpurpython.wordpress.com/2018/03/15/experience-of-pandas-documentation-sprint/" style="background-color: white; color: #1155cc; font-family: arial, sans-serif; font-size: 12.8px;" target="_blank">https://kanpurpython.<wbr></wbr>wordpress.com/2018/03/15/<wbr></wbr>experience-of-pandas-<wbr></wbr>documentation-sprint/</a></li>
<li>Live streaming of the sprint in Shen Zhen: <a href="https://www.youtube.com/watch?v=SK-sF_biP04">https://www.youtube.com/watch?v=SK-sF_biP04</a></li>
<li>From Marc, participant in Toronto: <a href="https://towardsdatascience.com/making-my-first-open-source-software-contribution-8ebf622be33c">https://towardsdatascience.com/making-my-first-open-source-software-contribution-8ebf622be33c</a></li>
<li>From <a href="https://bluekiri.com/">Bluekiri</a>, sponsor in Mallorca: <a data-saferedirecturl="https://www.google.com/url?hl=en&q=https://medium.com/bluekiri/pandas-documentation-sprint-90f5a76c0e24&source=gmail&ust=1521676822822000&usg=AFQjCNGxkFUCBgOGYXS7Tvs6g5QuGA2akA" href="https://medium.com/bluekiri/pandas-documentation-sprint-90f5a76c0e24" style="background-color: white; color: #1155cc; font-family: arial, sans-serif; font-size: 12.8px;" target="_blank">https://medium.com/bluekiri/<wbr></wbr>pandas-documentation-sprint-<wbr></wbr>90f5a76c0e24</a></li>
</ul>
</div>
And it's worth taking a look at this analysis on the impact on the sprint in the pandas GitHub activity by <a href="https://twitter.com/jorisvdbossche">Joris</a>:<br />
<a href="https://jorisvandenbossche.github.io/blog/2018/03/13/pandas-sprint-activity/">https://jorisvandenbossche.github.io/blog/2018/03/13/pandas-sprint-activity/</a><br />
<h2>
</h2>
<h2>
#pandasSprint aftermath</h2>
<div>
<br /></div>
<div>
This is what I think was the aftermath of the sprint:</div>
<div>
<ul>
<li>A lot of hard work before the sprint by all the local organizers and core developers</li>
<li>More than 200 pull requests sent, around 150 already merged</li>
<li>Many people really loved the experience</li>
<li>An incredible work by the pandas core development team after the sprint</li>
<li>In London, our sprint after the 10th of March have long waiting list, which was not happening before the #pandasSprint</li>
<li>Several people keeps contributing to the pandas documentation after sending their first contribution</li>
</ul>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEji3abYkjhA1rqpAJ4vYBuFROsSocK-umcY-OusJ6hv9WK07mAhiH4J7HrpWHktE4rWRHqo7MUlO9wuBevIa5ua6-YQrmOdUTvdBV34dauG6tLsPtsI83EAXpqqPw_8mJWzXmlZcAVlNkp6/s1600/Screenshot+at+2018-03-22+00-41-14.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="189" data-original-width="802" height="75" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEji3abYkjhA1rqpAJ4vYBuFROsSocK-umcY-OusJ6hv9WK07mAhiH4J7HrpWHktE4rWRHqo7MUlO9wuBevIa5ua6-YQrmOdUTvdBV34dauG6tLsPtsI83EAXpqqPw_8mJWzXmlZcAVlNkp6/s320/Screenshot+at+2018-03-22+00-41-14.png" width="320" /></a></div>
<div>
<br /></div>
<div>
And what I think it's more important. We did a small but great step in making sprints a popular event format in the Python community, to add the missing piece to the numerous conferences, meetups based on talks, dojos, workshops and others.</div>
</div>
<div>
<br /></div>
<div>
Several people asked me when is the next one. In London we are having two sprints this week. Man AHL is hosting this great <a href="http://ahl.com/hackathon">hackathon</a> in one month. I hope to see other user groups organizing sprints in the future. And about another worldwide sprint... May be in some months we could do a PyData Festival and have 10,000 people contributing to 20 different projects during a whole weekend? :)</div>
Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com2tag:blogger.com,1999:blog-4345066302652424538.post-68341670792077332052017-12-17T00:03:00.000+01:002017-12-18T14:04:39.485+01:00My NIPS write upJust as a quick disclaimer, this post is about my personal experience and opinions at NIPS 2017, and I'm not an AI researcher, I work as a data scientist in the industry. For a more technical summary of the talks and papers presented, you may want to check this <a href="https://cs.brown.edu/~dabel/blog/posts/misc/nips_2017.pdf">document by David Abel</a>.<br />
<br />
<h2>
Deep learning rigor and interpretability</h2>
<div>
This is quite a controversial topic, but this is how I see it. There are two main approaches to the idea of statistics/learning:</div>
<div>
<ol>
<li>Understand how learning works, and replicate it based on this understanding</li>
<li>Focus on results, no matter if it's at the cost of poor understanding</li>
</ol>
<div>
I think these two approaches were first dividing statisticians and machine learning practitioners, as Leo Breiman describes in <a href="http://www2.math.uu.se/~thulin/mm/breiman.pdf">The two cultures</a>. And in a similar way, today it divides the Deep learning school, which is somehow winning in terms of results, from other techniques.</div>
</div>
<div>
<br /></div>
<div>
My view on deep learning is that we've managed to understand in a general way the how the human brain works. Not why, but with the research of people like Santiago Ramon y Cajal, Camilo Golgi, Donald Hebb..., we know that it's a network of neurons, and that the "intelligence" is on how the neurons connect, and not in the neurons themselves.</div>
<div>
<br /></div>
<div>
With the research of Warren McCulloch, Walter Pitts, John Hopfield, Geofreey Hinton..., we can replicate this structure of neurons in an artificial way. Just with a set of connected linear regressions, with activation functions to break the linearity. And with current computation power, including optimized hardware like GPUs, we can implement networks of neurons at a huge scale. We know that the model works, because it works for the human brain, and we're confident it's the same. But we don't know how each neuron is connected in the brain (how much signal it needs to receive from the other networks to activate), so we miss the weights of the linear regressions.</div>
<div>
<br /></div>
<div>
With techniques like backpropagation, stochastic gradient decent... we can optimize the weights to make useful things, like image or sound recognition and generation.</div>
<div>
<br /></div>
<div>
So, how I see it, the main question is:</div>
<div>
<ul>
<li>Does it matter the rigor, how much we understand about what we do, how much we understand our models and their predictions? Or we just care about minimizing the out of sample error?</li>
</ul>
<div>
This may be a free interpretation of what was being discussed at NIPS, for example at <a href="https://www.youtube.com/watch?v=Qi1Yry33TQE">Ali Rahimi's talk</a>, or at the <a href="https://www.youtube.com/watch?v=2hW05ZfsUUo">interpretability debate</a>. It was interesting to see how excited people was about the debate, and the "celebrities" on the stage:</div>
</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
I think someone important was missing from the debate, and it's what Chris Olah and Shan Carter describe as <a href="https://distill.pub/2017/research-debt/">research debt</a>. Like in software, it's not only important what do you have today. It's important what will you have in the future. The best the internal quality of your software, the easier will be to improve it and add new features in the future. I think every good sofware engineer is aware of how important is to keep technical debt under control. But I don't think most researchers are aware that our understanding of the research today, is key for future research.</div>
<div>
<br /></div>
<div>
So, in my opinion, it's not that important that with deep learning we can have state of the art results in many areas. I don't think we'll have much better results in the future, unless we focus on quality research, and not just trying random things to get a small increase in the model accuracy.</div>
<div>
<br /></div>
<h2>
GANs</h2>
<div>
I think Generative Adversarial Networks were by far the most popular topic at NIPS. I'm not sure how many talks <a href="https://twitter.com/goodfellow_ian">Ian Goodfellow</a> gave, but it don't think it wasn't far from one every day. And it was all sort of applications of GANs, including many for creativity and design. We're not yet in the point of being able to generate arbitrary images with high definition, but it doesn't seem it'll take that long to have even more impressive results than what we've already seen. One of the most discussed articles was the <a href="http://research.nvidia.com/publication/2017-10_Progressive-Growing-of">GAN that generates celebrity faces</a>.</div>
<h2>
Bayesian statistics</h2>
<div>
Bayesian statistics was also very present during the whole NIPS. Many times together with deep learning, like in the <a href="https://www.youtube.com/watch?v=LVBvJsTr3rg">Bayesian deep learning and deep Bayesian learning</a> talk, the <a href="http://bayesiandeeplearning.org/">Bayesian deep learning workshop</a>, or the <a href="https://arxiv.org/abs/1705.09558">Bayesian GAN paper</a>. Gaussian processes and Bayesian optimization was also present from the tutorials, to the workshops.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4PLfmPnPm2bQ31DA6Em4UiUbxMPq976jxyApsi5L1s1Egg9DWBVanSiCypxx5iZWRp6LbBLRWi2qiyMz4TruQJUs3pGsS_yURLVy9Do_wGPfsntdgS1b4fZjK_xKvz1auAR003UfBV4pc/s1600/IMG_20171207_095819.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="899" data-original-width="1600" height="179" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4PLfmPnPm2bQ31DA6Em4UiUbxMPq976jxyApsi5L1s1Egg9DWBVanSiCypxx5iZWRp6LbBLRWi2qiyMz4TruQJUs3pGsS_yURLVy9Do_wGPfsntdgS1b4fZjK_xKvz1auAR003UfBV4pc/s320/IMG_20171207_095819.jpg" width="320" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
Surprisingly to me, most of the papers presented about multi-armed bandit problems were based on frequentist statistics. And I say surprisingly, because I think the industry is mostly adopting Bayesian methods for A/B testing, one of the main applications. In my opinion Bayesian methods are much simpler and intuitive, and tend to offer better results. One of the hot topics in this area is lowering the false discovery rate in repeated tests. And many paper about contextual bandits were also presented, and are that I discovered at NIPS.</div>
<h2>
Reinforcement learning</h2>
RL was the last of the main topics that kept repeating during the whole NIPS, if I'm not missing any. Both based on the classic q-learning, or by using deep learning representations.<br />
<h2>
Other topics</h2>
<div>
There were a couple of other topics that I found interesting, and that they were new to me:</div>
<div>
<ul>
<li>Optimal transportation</li>
<li>Distribution regression</li>
</ul>
<div>
A great talk, but not because of the technical content, was the "Improvised Comedy as a Turing Test", where two researchers and comedian performed improvised comedy with a robot implemented by them:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3xtCxKn30DE1HGQVzPwlYKsWaixPRFKVcCeJcvSi53tm3XgHksWO4t3RURZ5Ra6DvBwfS650mj8cU9gKq8D16q67vwkBPVtOb90cNQiak5gCiW-RMSLj0m6q4J9ThBbZuYPnA0h5VlmlZ/s1600/IMG_20171208_114630.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="899" data-original-width="1600" height="179" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3xtCxKn30DE1HGQVzPwlYKsWaixPRFKVcCeJcvSi53tm3XgHksWO4t3RURZ5Ra6DvBwfS650mj8cU9gKq8D16q67vwkBPVtOb90cNQiak5gCiW-RMSLj0m6q4J9ThBbZuYPnA0h5VlmlZ/s320/IMG_20171208_114630.jpg" width="320" /></a></div>
<div>
<br /></div>
<h2>
About the conference</h2>
</div>
<div>
It was the first time for me attending an academic conference, and some things weren't very intuitive, being used to open source of business conferences. This is a random list with my thoughts:</div>
<div>
<ul>
<li>I found the location quite good:</li>
<ul>
<li>Near to a main airport, so I could fly directly from London</li>
<li>Good temperature</li>
<li>Many hotels nearby</li>
<li>English speaking country</li>
<li>The only problem with the location was that people from several countries (e.g. Iran) were banned from attending, as the organizers mentioned in the home page of the conference</li>
</ul>
</ul>
<ul>
<li>I found the use of an app to communicate during the conference quite convenient. Even if the app had some obvious flaws, like the mess with the list of discussions, it added a lot of value</li>
</ul>
<ul>
<li>I found it difficult to know what to expect about food. I think in all previous conference I attended (and they are not few), breakfast and lunch was provided. At #NIPS it was advertised in the schedule that breakfast wasn't offered first time in the morning, no other mention. Then, breakfast was provided later in the morning (one day the <a href="https://twitter.com/de3ug/status/938846787749552128">breakfast</a> was obviously decided by an algorithm). Lunch wasn't provided, and dinner was provided, but in a different undisclosed location in the venue. One day dinner was provided twice (the regular, plus a voucher for a food truck, only valid that day for dinner).</li>
</ul>
<ul>
<li>The sponsors were quite interesting. Not only because I managed to get up to 10 t-shirts (including one with Thomas Bayes face), but because I've got very interesting conversations with many people at the booths. I found it interesting the diversity of countries represented in the sponsor area. While one could expect that Silicon Valley companies could eclipse the rest, the number of Chinese and English companies was at the same level, and some other countries represented, like Canada or Germany. One of the fun things on the sponsors sections were the live cameras performing predictions or style transfer:</li>
</ul>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiV9z6STHI-NcwM_U_-2As72rBjiSElyasDtdkkzjvhPL55yXF9I-EHWSi3SLjz9bj36amuVTdoqIVJs6WOk6iw0gVXKBTksbUuM0EyxfAYp00uDAF2tWfkkqlHuozvYjMwaKpQnc3zOKbs/s1600/IMG_20171205_112058.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="899" data-original-width="1600" height="179" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiV9z6STHI-NcwM_U_-2As72rBjiSElyasDtdkkzjvhPL55yXF9I-EHWSi3SLjz9bj36amuVTdoqIVJs6WOk6iw0gVXKBTksbUuM0EyxfAYp00uDAF2tWfkkqlHuozvYjMwaKpQnc3zOKbs/s320/IMG_20171205_112058.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5JbvKKGyPgsT4OYtMTVc6W_mnHSbTuCs1fumL9OOuVllZ5JDF_ppQTIUh1cO-ujRKCrkdboVSdt95JNN39hsDIYcsDirdbnlPKgLhjFFbcqaYuSr_iZNUle9gUALDYAYYcT-Y_aGSTTsp/s1600/IMG_20171206_103655.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="899" data-original-width="1600" height="179" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5JbvKKGyPgsT4OYtMTVc6W_mnHSbTuCs1fumL9OOuVllZ5JDF_ppQTIUh1cO-ujRKCrkdboVSdt95JNN39hsDIYcsDirdbnlPKgLhjFFbcqaYuSr_iZNUle9gUALDYAYYcT-Y_aGSTTsp/s320/IMG_20171206_103655.jpg" width="320" /></a></div>
<div>
<br /></div>
<ul>
<li>Compared to open source conferences, I found the atmosphere at NIPS very different. May be it's by the nature of research and open source, but my experience is that open source conferences have a very collaborative environment. You don't necessarily need to like or use someone else's project, to have a friendly discussion or appreciate his contribution. But I felt research quite a competitive environment. More than once I saw people in presentations or posters addressing the presenter in a not very nice way. Challenging their research, trying to point out that they know better. I think providing constructive feedback is always great, but I found sad this feeling of mine (that may be biased by just the few examples I saw) that researchers see each others more as rivals, than as part of a community that delivers together.</li>
</ul>
<h2>
Systems</h2>
</div>
<div>
On the systems part (mainly in the workshop), it was very interesting to see the talks about the main tensor software from the big companies at Silicon Valley:</div>
<div>
<ul>
<li><a class="twitter-atreply pretty-link js-nav" data-mentioned-user-id="254107028" dir="ltr" href="https://twitter.com/TensorFlow" style="background: rgb(245, 248, 250); color: #1b95e0; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 14px; text-decoration: none; white-space: pre-wrap;">@TensorFlow</a><span style="background-color: #f5f8fa; color: #14171a; font-family: "helvetica neue" , "helvetica" , "arial" , sans-serif; font-size: 14px; white-space: pre-wrap;"> : </span><a class="twitter-atreply pretty-link js-nav" data-mentioned-user-id="17661484" dir="ltr" href="https://twitter.com/rajatmonga" style="background: rgb(245, 248, 250); color: #1b95e0; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 14px; text-decoration: none; white-space: pre-wrap;">@rajatmonga</a><span style="background-color: #f5f8fa; color: #14171a; font-family: "helvetica neue" , "helvetica" , "arial" , sans-serif; font-size: 14px; white-space: pre-wrap;"> </span><a class="twitter-atreply pretty-link js-nav" data-mentioned-user-id="20536157" dir="ltr" href="https://twitter.com/Google" style="background: rgb(245, 248, 250); color: #1b95e0; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 14px; text-decoration: none; white-space: pre-wrap;">@Google</a></li>
<li><a class="twitter-atreply pretty-link js-nav" data-mentioned-user-id="776585502606721024" dir="ltr" href="https://twitter.com/PyTorch" style="background: rgb(245, 248, 250); color: #1b95e0; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 14px; text-decoration: none; white-space: pre-wrap;">@PyTorch</a><span style="background-color: #f5f8fa; color: #14171a; font-family: "helvetica neue" , "helvetica" , "arial" , sans-serif; font-size: 14px; white-space: pre-wrap;"> : </span><a class="twitter-atreply pretty-link js-nav" data-mentioned-user-id="70831441" dir="ltr" href="https://twitter.com/soumithchintala" style="background: rgb(245, 248, 250); color: #1b95e0; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 14px; text-decoration: none; white-space: pre-wrap;">@soumithchintala</a><span style="background-color: #f5f8fa; color: #14171a; font-family: "helvetica neue" , "helvetica" , "arial" , sans-serif; font-size: 14px; white-space: pre-wrap;"> </span><a class="twitter-atreply pretty-link js-nav" data-mentioned-user-id="2425151" dir="ltr" href="https://twitter.com/facebook" style="background: rgb(245, 248, 250); color: #1b95e0; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 14px; text-decoration: none; white-space: pre-wrap;">@facebook</a></li>
<li>CTNK: Cha Zang, <a class="twitter-atreply pretty-link js-nav" data-mentioned-user-id="74286565" dir="ltr" href="https://twitter.com/Microsoft" style="background: rgb(245, 248, 250); color: #1b95e0; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 14px; text-decoration: none; white-space: pre-wrap;">@Microsoft</a></li>
<li><a class="twitter-atreply pretty-link js-nav" data-mentioned-user-id="824121287790493697" dir="ltr" href="https://twitter.com/ApacheMXNet" style="background: rgb(245, 248, 250); color: #1b95e0; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 14px; text-decoration: none; white-space: pre-wrap;">@ApacheMXNet</a><span style="background-color: #f5f8fa; color: #14171a; font-family: "helvetica neue" , "helvetica" , "arial" , sans-serif; font-size: 14px; white-space: pre-wrap;"> : </span><a class="twitter-atreply pretty-link js-nav" data-mentioned-user-id="3187990776" dir="ltr" href="https://twitter.com/tqchenml" style="background: rgb(245, 248, 250); color: #1b95e0; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 14px; text-decoration: none; white-space: pre-wrap;">@tqchenml</a><span style="background-color: #f5f8fa; color: #14171a; font-family: "helvetica neue" , "helvetica" , "arial" , sans-serif; font-size: 14px; white-space: pre-wrap;"> </span><a class="twitter-atreply pretty-link js-nav" data-mentioned-user-id="20793816" dir="ltr" href="https://twitter.com/amazon" style="background: rgb(245, 248, 250); color: #1b95e0; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 14px; text-decoration: none; white-space: pre-wrap;">@amazon</a></li>
</ul>
</div>
<div>
On the fun side, TensorFlow presented their eager mode, and <a href="https://twitter.com/soumithchintala">Soumith Chintala</a> mentioned that "PyTorch implementes the eager mode, before the eager mode existed". And some time after he mentioned that PyTorch will implement distributions soon, the way TensorFlow does. So, the main innovation from each project, is copied from the competitor. :)</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYLrFisMA3VZR0EJgJ53joXdzuz7_9ymsFJGqmma5WtyQUFO3LgueMwjJMqJyhXFcvHVwDVUutuVAirxTLQYvBuMLstsf-6f-8I2un0aMOeLsP-949dyUD2sGmAj5ot2Ejb2FbQA4g7tug/s1600/IMG_20171208_133415.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="899" data-original-width="1600" height="179" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYLrFisMA3VZR0EJgJ53joXdzuz7_9ymsFJGqmma5WtyQUFO3LgueMwjJMqJyhXFcvHVwDVUutuVAirxTLQYvBuMLstsf-6f-8I2un0aMOeLsP-949dyUD2sGmAj5ot2Ejb2FbQA4g7tug/s320/IMG_20171208_133415.jpg" width="320" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
Tensors aside, the star of the ML Systems workshop was <a href="http://www.businessinsider.com/astounding-facts-about-googles-most-badass-engineer-jeff-dean-2012-1?IR=T">Jeff Dean</a>. He discussed TPUs, and how Google is creating the infrastructure for training deep learning models. The interest in Google, deep learning and Jeff Dean was maximum, and the room was as crowded as a room can be. Some time before the talk, I had the honor to meet Jeff Dean, as the picture proves:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgv0pCUGLzqmsjaqUHriZA39bBYuCpkCRxdnJ4yVj0fIKf5TYYtSzffAEgnz_bvTf3Jai1Y9RvQKXSoIsftCj8kcy3Nhpdd4IW4qYNqdYVD1hnNYNCriWo-NkSPvuDeZdO4xcA92laDjJnY/s1600/b7145f54-cb90-4d92-b370-4d938ffb754c.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="898" data-original-width="1600" height="179" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgv0pCUGLzqmsjaqUHriZA39bBYuCpkCRxdnJ4yVj0fIKf5TYYtSzffAEgnz_bvTf3Jai1Y9RvQKXSoIsftCj8kcy3Nhpdd4IW4qYNqdYVD1hnNYNCriWo-NkSPvuDeZdO4xcA92laDjJnY/s320/b7145f54-cb90-4d92-b370-4d938ffb754c.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjyFBXPl2p7am0xbLlw8scElw7M087rBTmESRAofywYroxDdq98Iu4hln3lYBSgFRNamGHEdc0RvKu9weIV7o6l-APaS1I2O1I4i2OQKCtJPxe-tEmm5FKAoltJzBBrVTOxdx-crj16CPI/s1600/IMG_20171208_145408.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="899" data-original-width="1600" height="179" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjyFBXPl2p7am0xbLlw8scElw7M087rBTmESRAofywYroxDdq98Iu4hln3lYBSgFRNamGHEdc0RvKu9weIV7o6l-APaS1I2O1I4i2OQKCtJPxe-tEmm5FKAoltJzBBrVTOxdx-crj16CPI/s320/IMG_20171208_145408.jpg" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0GsspvPmykKp_g6VsafWqru1FF3gPF3FJUoUgRGzcGkz8aSRFMN4g7RSRF_BLXQudZo3KAXkCHaRQp84ROdWzMNJIzcP2-m308twitNI-P9g6RHiR8fOZm0rLRoDa9379d4A7Iv_wB-Yb/s1600/IMG_20171208_145427.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1200" data-original-width="1600" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0GsspvPmykKp_g6VsafWqru1FF3gPF3FJUoUgRGzcGkz8aSRFMN4g7RSRF_BLXQudZo3KAXkCHaRQp84ROdWzMNJIzcP2-m308twitNI-P9g6RHiR8fOZm0rLRoDa9379d4A7Iv_wB-Yb/s320/IMG_20171208_145427.jpg" width="320" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
On the more pragmatic part, it was interesting to see the poster about <a href="https://github.com/catboost/catboost">CatBoost</a>, Yandex's version of gradient boosting trees. I found the ideas in the paper quite interesting. There are different novel parts compared to xgboost. I spent a bit of time testing if the results were as good as presented, but the documentation is not yet as good as could be, and the API a bit confusing, and I finally gave up.</div>
<div>
<br /></div>
<div>
One of the most interesting insights from NIPS, wasn't actually presented. It was in a discussion with <a href="https://twitter.com/GaelVaroquaux">Gael Varoquaux</a>, core contributor of scikit-learn. I wanted to talk with him about scikit-learn, and see if we could help with its development as part of the <a href="https://www.meetup.com/Python-Sprints/">London Python Sprints</a> group. But given the current state and the nature of the project, that doesn't seem very useful at this point (See <a href="https://datapythonista.blogspot.co.uk/2017/12/my-nips-write-up.html?showComment=1513602207828#c4342955784338059589">this comment</a> for clarification on this). But what it was interesting about the conversation, was to discover the new <a href="https://github.com/scikit-learn/scikit-learn/pull/9012">ColumnTransformer</a>. While it's not yet merged, a pull request already exists to be able to apply sklearn transformers to a subset of columns. At the moment sklearn doesn't provide an easy way (or a way that you can understand your models later), and I think most of us were implementing this ourselves in our own projects.</div>
<h2>
A sad story</h2>
<div>
To conclude, I want to mention not something that I experienced myself at NIPS, but that many of us read later on, and it's <a href="https://medium.com/@kristianlum/statistics-we-have-a-problem-304638dc5de5">Kristian Lum story</a> about sexual harassment in research. Hopefully all this wave of scandals is the beginning of the end, from English politicians, to Hollywood... And it may not be fair, but while equally disgusting as all the other cases, I found it more surprising in research. That the brightest minds in their fields have been abusing and abused, is something that I find more shocking than in an industry like Hollywood.</div>
<div>
<br /></div>
<div>
The second part of the story, this one with names, came not much later, in this <a href="https://www.bloomberg.com/news/articles/2017-12-16/google-researcher-accused-of-sexual-harassment-roiling-ai-field">Bloomberg article</a>.</div>
<div>
<br /></div>
<div>
On a positive note, I think the problem is not that difficult to solve. In the Python community I think we've got all the mechanisms in place in order to avoid these problems as much as possible. With strict codes of conducts, to whistleblower channels in conferences like EuroPython, to a friendly and inclusive environment. The paradox is that the proportion of female attendees in Python conferences is much smaller than what I saw at NIPS. I'd bet a large number of women should make these cases less likely.</div>
<div>
<br /></div>
<div>
I hope the example of Kristian is not only useful to fix this specific case, but also to make it easier for other people to speak up, and finish with this forever.</div>
<br />Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com2tag:blogger.com,1999:blog-4345066302652424538.post-46334798795680487982017-10-13T21:30:00.000+02:002017-10-13T21:36:17.261+02:00Assigning yourself to a GitHub issueContributing to open source is one of the most rewarding experiences one can find. Just finding a bug or a new cool feature of a widely used library, working on it, and sharing it with the rest of the users. This is how open source has become so great and so widely used.<br />
<br />
The workflow just described is relatively simply at a small scale, but can become trickier when many people is working in the same project at the same time.<br />
<br />
One idea I have in mind, is to create a macro-sprint, where many Python user groups of all around the world sprint on improving <a href="https://github.com/pandas-dev/pandas">Pandas</a> documentation. Pandas documentation isn't bad, but it could easily be improved by adding more examples to the DataFrame and Series methods for example. An example of page that could be improved by adding examples is the <a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.rmul.html#pandas.Series.rmul">Series rmul method</a>.<br />
<br />
To organize this, every sprinting team could get a subset of methods. For example, one of the teams could work on the <a href="https://pandas.pydata.org/pandas-docs/stable/api.html#conversion">Series conversion methods</a>. This is a bit tricky, but even with a simple online spreadsheet with all the method categories, we could assign each to a group.<br />
<br />
Then, in a sprint with 20 people, working in the same methods, we would create another spreadsheet with each method, and every programmer could assign himself to the method he wants to work on. So, nobody else works on it, which would end up in a lot of wasted time and duplicated work.<br />
<br />
But of course, this is very tricky. In a coordinated sprint, working on something very structured like Pandas methods could work. But sounds ridiculous that each project has a spreadsheet with the list of issues, so every programmer can let the others know what she or he is working on.<br />
<br />
This was a solved problem 10 years ago when I was quite involved with the <a href="https://www.djangoproject.com/">Django</a> community. At that time, Django was using <a href="https://trac.edgewall.org/">Trac</a> to manage the tickets. And every ticket had an "Assigned to" field, where a programmer could let others know that they shouldn't work on it without talking to her or him first.<br />
<br />
What is this an issue today? While there are few companies that did as much as <a href="https://github.com/">GitHub</a> for the open source community, I think they made a big mistake. GitHub also has the "Assigned to" field, but this can only be edited by core developers of the project.<br />
<br />
Core developers are surely one of the bottlenecks of every open source community. Coming back to Pandas, there are at the time of writing this post, 100 open pull requests. So, it doesn't seem a good idea, that every time you want to work on an issue, you need to bother a core developer, so she or he assigns the ticket to you.<br />
<br />
Is this affecting the open source community? It's difficult to tell, but if we compare the number of assigned tickets in Pandas and Python, we can see how Pandas has 2,039 open issues, but only 30 of them are assigned (I bet all them to core developers).<br />
<br />
In comparison, if we check the <a href="https://bugs.python.org/">Python bug tracker</a> (Python uses GitHub for the code, but not for the issues), we can see that around 50% of the tickets seem to be assigned to someone.<br />
<br />
It's difficult to tell what's the effect in code contributions, besides in ticket assignment, but it's reasonable to think that GitHub is discouraging users from contributing, by not letting them assign issues to themselves.<br />
<br />
As shown in this <a href="https://github.com/isaacs/github/issues/100">thread</a>, npm creator requested this feature in 2013. 4 years later, there are many +1's in this unofficial ticket (it's not a ticket for GitHub developers, it's for the creator of npm himself, to keep track of his request to GitHub). But the feature is still missing.<br />
<br />
Why GitHub is against, or has no interest, in a feature so obviously needed to have a healthy open source community is a mystery to me. But if you feel like I feel, please let <a href="https://github.com/contact">GitHub support</a> know.Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com1tag:blogger.com,1999:blog-4345066302652424538.post-43102678617450386972017-05-09T11:06:00.000+02:002017-05-11T15:06:32.373+02:00PyData London 2017, write upThis is a post about my experience at <a href="https://pydata.org/london2017/">PyData London 2017</a>. About what I liked, what I learnt... Note that having 4 tracks, and so many people, my opinions are very biased. If you want to know how your experience would be, it'll be amazing, but different than mine. :)<br />
<br />
On the organization side, I think it's been excellent. Everything worked as expected, and when I've got a problem with wifi, I got it fixed literally in couple of minutes by the organizers. It was great to have sushi and burritos instead of last year sandwiches too. The slack channels were quite useful and well organized. I think the organizers deserve a 10, and that's very challenging when organizing a conference.<br />
<br />
More on the content side, I used to attend conferences mainly for talks. But this year I decided to try other things a conference can offer (networking, sprints, unconference sessions...). Some random notes:<br />
<br />
<b><span style="font-size: x-large;">Bayesian stuff</span></b><br />
<b><br /></b>
I think probabilistic models is the are of data science with a higher entry barrier. This is a personal opinion, but shared by many others, including authors:<br />
<br />
<div style="text-align: center;">
<i style="font-style: italic;">The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chapters of slow, mathematical analysis. The typical text on Bayesian inference involves two to three chapters on probability theory, then enters what Bayesian inference is. Unfortunately, due to mathematical intractability of most Bayesian models, the reader is only shown simple, artificial examples. This can leave the user with a so-what feeling about Bayesian inference. In fact, this was the author's own prior opinion.</i></div>
<div style="text-align: center;">
<a href="https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers">Cameron Davidson-Pilon</a></div>
<br />
It looks like there is even terminology to define whether the approach used is mathematical (formulae and proofs quite cryptic to me), or computational (more focused on the implementation).<br />
<br />
It was luxury to have at PyData once more, <a href="https://www.linkedin.com/in/vincentwarmerdam/">Vincent Warmerdam</a>, from the PyData Amsterdam organization. He has been one step ahead of most of us, who are more focused on machine learning (I didn't meet any frequentist so far at PyData conferences). He already gave a talk last year about the topic, <a href="https://www.youtube.com/watch?v=BiYTLb-o1Dk&list=PLGVZCDnMOq0rzDLHi5WxWmN5vueHU5Ar7&index=2">The Duct Tape of Heroes: Bayes Rule</a>, which was quite inspiring and make probabilistic models easier, and this year we've got another amazing talk, <a href="https://pydata.org/london2017/schedule/presentation/36/">SaaaS: Sampling as an Algorithm Service</a>.<br />
<br />
After that, we managed to have an unconference session with him, where we could see more in detail the examples presented in the talk. While Markov Chain Monte Carlo or Gibbs sampling aren't straight forward to learn, I think we all learnt a lot, so we can finish learning all the details easily by ourselves.<br />
<br />
There were other sessions about Bayesian stuff too:<br />
<br />
<ul>
<li><a href="https://pydata.org/london2017/schedule/presentation/61/" style="box-sizing: border-box; color: #337ab7; text-decoration-line: none; transition: all 295ms ease;">Bayesian optimisation with scikit-learn</a> - Thomas Huijskens</li>
<li><a href="https://pydata.org/london2017/schedule/presentation/15/" style="box-sizing: border-box; color: #337ab7; text-decoration-line: none; transition: all 295ms ease;">Variational Inference and Python</a> - Peadar Coyle</li>
<li><a href="https://pydata.org/london2017/schedule/presentation/33/" style="box-sizing: border-box; color: #337ab7; text-decoration-line: none; transition: all 295ms ease;">Bayesian Deep Learning with Edward (and a trick using Dropout)</a> - Andrew Rowan</li>
<li><a href="https://pydata.org/london2017/schedule/presentation/45/" style="box-sizing: border-box; color: #337ab7; text-decoration-line: none; transition: all 295ms ease;">Segmenting Channel 4 Viewers using LDA Topic Modelling</a> - Thomas Nuttall</li>
</ul>
<div>
And probably some others that I'm missing, so it looks like the interest on the area is growing, and <a href="https://github.com/pymc-devs/pymc3">PyMC3</a> looks to be the preferred option of most people.</div>
<br />
<br />
I've got good recommendations of books related to probabilistic models and Bayesian stuff, which shouldn't use the tough approach:<br />
<br />
<ul>
<li><a href="https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers">Bayesian methods for Hackers</a></li>
<li><a href="http://www.inference.eng.cam.ac.uk/mackay/itila/">Information theory, inference and learning algorithms</a></li>
<li><a href="https://www.amazon.co.uk/Computer-Age-Statistical-Inference-Mathematical/dp/1107149894/ref=sr_1_1?s=books&ie=UTF8&qid=1494246251&sr=1-1&keywords=computer+age+statistical+inference">Computer age statistical inference</a></li>
<li><a href="https://www.crcpress.com/Statistical-Rethinking-A-Bayesian-Course-with-Examples-in-R-and-Stan/McElreath/p/book/9781482253443">Statistical Rethinking: A Bayesian course with examples in R and Stan</a></li>
</ul>
<br />
There is a Meetup in London, which is the place to be to meet other Bayesians:<br />
<br />
<ul>
<li><a href="https://www.meetup.com/Bayesian-Mixer-London/">Bayesian Mixer London</a></li>
</ul>
<br />
<span style="font-size: x-large;"><b>Frequentist stuff</b></span><br />
<div style="text-align: center;">
<br /></div>
<div style="text-align: center;">
<This space is for sale, contact the administrator of the page></div>
<div style="text-align: center;">
<br /></div>
<div style="text-align: left;">
<span style="font-size: x-large;"><b>Topic modeling and Gensim</b></span></div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
Another topic that it looks like it's trending is topic modelling, using vector spaces for NLP, and Gensim in particular. Including Latent Dirichlet allocation, one of the most amazing algorithms I've seen in action.</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
We also got a Gensim sprint during the conference, and we could not only learn about what Gensim does, but also why is a great open source project. In the past I could see how Gensim was able to answer the most similar documents immediately, in a dataset with more than one million samples. While the documentation gives many hints on how Gensim was designed with performance in mind, it was a pleasure to participate in a Gensim sprint, and see the code and the people who make this happen in action.</div>
<div style="text-align: left;">
<br /></div>
Amazing also to see how <a href="https://www.linkedin.com/in/levkonst/">Lev Konstantinovskiy</a> managed to run a tutorial, a talk, a sprint and a lightning talk, during the conference.<br />
<br />
<b style="font-size: x-large;">From theory to practice</b><br />
<br />
It may be just my impression, but I'd say there have been more talks on applications of data science, and more diverse. While I remember talks on common applications like recommender systems in previous editions, I think it's been an increase on the talks on applications of all these techniques, in different areas.<br />
<br />
To name few:<br />
<ul>
<li><a href="https://pydata.org/london2017/schedule/presentation/38/">Data science used to see the popularity of users in a Muslim dating app</a></li>
<li><a href="https://pydata.org/london2017/schedule/presentation/16/">Intelligent ventilators, that make newborns breath when they need it</a></li>
<li><a href="https://pydata.org/london2017/schedule/presentation/43/">Electrocardiogram analysis with time series techniques</a></li>
</ul>
<div>
Also, the astronomy/aeroespace communities look to be quite active inside the PyData community</div>
<div>
<br /></div>
<div>
<span style="font-size: x-large;"><b>Data activism</b></span></div>
<div>
<br /></div>
<div>
Another area which I'd say it's growing is data activism. Or how to use data in a social or political way. We got a keynote on fact checking, and another about analyzing data for good, to prevent money laundry with government information.</div>
<div>
<br /></div>
<div>
<a href="http://www.datakind.org/chapters/datakind-uk">DataKind UK</a> looks to be the place to be, to participate on this efforts.</div>
<div>
<br /></div>
<div>
<b style="font-size: x-large;">Pub Quiz</b></div>
<br />
<div style="text-align: center;">
<i>That awkward moment when you thought you knew Python, but James Powell is your interviewer...</i></div>
<br />
Ok, it wasn't an interview, it was a pub quiz, but the feeling was somehow similar. 10 years working in Python, I passed challenging technical interviews for companies such as Bank of America or Google, and at some point you start to think you know what you're doing.<br />
<br />
Then, when you're relaxed in a pub, after and amazing but exhausting day, <a href="https://twitter.com/dontusethiscode">James Powell</a> starts running the pub quiz, and you feel that you don't know anything about Python. Some new Python 3 syntax, all time namespace tricks, and so many atypical cases...<br />
<br />
Luckily, all the dots started to connect, and I realized that few hours before, I was discussing with <a href="https://twitter.com/holdenweb">Steve Holden</a> about the new edition of his book <a href="http://shop.oreilly.com/product/0636920012610.do">Python in a Nutshell</a>. Which sounded like an introduction to me, but it looks like it provides all Python internals.<br />
<br />
Going back to the pub quiz, I think it's one of the most memorable moments in a conference. Great people, loads of laughs, and an amazing set of questions perfectly executed.<br />
<br />
<span style="font-size: x-large;"><b>Big Data becoming smaller</b></span><br />
<br />
As I mentioned before, my experience at the conference is very biased, and very influenced by the talks I attended, the people I met... But my impression is that the boom on big data (large deep networks, spark...) is not a boom anymore.<br />
<br />
Of course there is a lot of people working with Spark, and researching in deep neural networks, but instead of growing, I felt like these things are loosing momentum, and people is focusing on other technologies and topics.<br />
<br />
<b style="font-size: x-large;">Meetup groups</b><br />
<br />
One of the things I was interested in, was on finding new interesting meetups. I think among the most popular ones in data science are:<br />
<br />
<ul>
<li><a href="https://www.meetup.com/PyData-London-Meetup/">https://www.meetup.com/PyData-London-Meetup/</a></li>
<li><a href="https://www.meetup.com/London-Machine-Learning-Meetup/">https://www.meetup.com/London-Machine-Learning-Meetup/</a></li>
<li><a href="https://www.meetup.com/London-ODSC/">https://www.meetup.com/London-ODSC/</a></li>
</ul>
<div>
But I met many organizers of other very interesting meetups at the conference:</div>
<div>
<ul>
<li><a href="https://www.meetup.com/London-Data-Science-Journal-Club/">https://www.meetup.com/London-Data-Science-Journal-Club/</a></li>
<li><a href="https://www.meetup.com/London-Kaggle-Meetup/">https://www.meetup.com/London-Kaggle-Meetup/</a></li>
<li><a href="https://www.meetup.com/project_euler/">https://www.meetup.com/project_euler/</a></li>
<li><a href="https://www.meetup.com/DataKind-UK/">https://www.meetup.com/DataKind-UK/</a></li>
</ul>
<div>
<b style="font-size: x-large;">Some obvious things</b></div>
</div>
<div>
<br /></div>
<div>
To conclude, there are couple of tools/packages I discovered, that seemed everybody else was aware of.</div>
<div>
<br /></div>
<div>
It looks like at some point, instant messaging of most free software projects moved from IRC to <a href="https://gitter.im/">gitter</a>. There you can find data science communities, like pandas, scikit-learn, as well as other non data science, like Django. </div>
<div>
<br /></div>
<div>
A package that many people seems to be using, is <a href="https://github.com/noamraph/tqdm">tqdm</a>. You can use it over an iterator (like enumerate), and it shows a progress bar while the iterations is running. Funny, that besides being an abbreviation of progress in Arabic, i's an abbreviation for "I want/love you too much" in Spanish.</div>
<div>
<br /></div>
<div>
<span style="font-size: x-large;"><b>What's next?</b></span></div>
<div>
<br /></div>
<div>
Good news. If you couldn't attend PyData London 2017, or you didn't have enough of it, there are some things you can do:</div>
<div>
<ul>
<li>Attend PyData Barcelona 2017, which will be as amazing as PyData London, also in English, and with top speakers like <a href="https://twitter.com/teoliphant">Travis Oliphant</a> (author of scipy and numpy) or <a href="https://twitter.com/francescalted?lang=en">Francesc Alted</a> (author of PyTables, Blosc, bcolz and numexpr).</li>
<li>Wait until the videos are published in the <a href="https://www.youtube.com/user/PyDataTV">PyData channel</a> (or watch the ones from other PyData conferences)</li>
<li>Join one of the 55 <a href="https://www.meetup.com/pro/pydata/">PyData meetups</a> around the world, or start yours (check <a href="https://docs.google.com/document/d/1ozK-MXUEANuO-xN3tQSCQ7AaqOkubKBeivqC3s-gB8I/edit">this document</a> to see how, <a href="https://www.numfocus.org/">NumFOCUS</a> will support you).</li>
<li>Join one of the other conferences happening later this year in Paris, Berlin, EuroPython in Italy, Warsaw... You can find all them at <a href="https://pydata.org/">https://pydata.org/</a></li>
</ul>
</div>
Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com0tag:blogger.com,1999:blog-4345066302652424538.post-64520394526290579362016-05-12T00:32:00.000+02:002016-05-12T01:18:31.457+02:00PyData write-upThis last weekend I went to my third PyData, the one in London, and it's been such a great experience.<br />
<br />
Before, I went to PyData Amsterdam, and PyData Madrid, also this year.<br />
<br />
After the three conferences, which were very similar, but quite different at the same time, I just wanted to share what I liked, and what <b>in my opinion</b> could be improved. I hope future organizers can find some useful information from my ideas and thought. And that includes my future self, for when I'm an organizer.<br />
<br />
<b>Organization</b><br />
<b><br /></b>
I wasn't involved that much in the organization, but my believe is that more should be delegated. I couldn't see it that much in Amsterdam, but organizers of both Madrid and London looked extremely exhausted at the end of the conference. May be I'm too optimistic, but I'd say that more people would like to help. I think a good idea is probably to find volunteers for specific tasks. For example, probably some people would be happy to help in the registration. And organizers would have more time for other things, and to rest.<br />
<br />
<b>Event hosts</b><br />
<b><br /></b>
I think in the three conferences there were amazing hosts (the people who gave the welcome speeches, closing notes...). Vincent and the Italian guy (sorry for not remembering your name if you read this) in Amsterdam,Guillem in Madrid, and Ian and Emlyn in London. I think the whole conference makes a difference having hosts with great humour and communication skills.<br />
<ul>
</ul>
<div>
<div>
<b>Communication</b></div>
<div>
<b><br /></b></div>
<div>
I think communication is quite important during the conference. In Madrid was great (and somehow easy), because it was only a single track, so organizers could provide any information between talks to all attendees (where the beers will be, to remind people to sign up for lightning talks...). In Amsterdam with 2 tracks they managed it very well.</div>
<div>
<br /></div>
<div>
In London, I think the communication could be better. With 4 tracks it gets much more challenging, but I think just a bit more of communication was needed, like reminding about the lightning talks, reminding about the tweeted photos contest...</div>
<div>
<br /></div>
<div>
I personally didn't like that much slack (was my first time using it). The mobile version (the web, not the app) is not very intuitive, and I had problems to find the channels. I prefer twitter to be honest.</div>
<div>
<br /></div>
<b>Networking</b></div>
<div>
<b><br /></b></div>
<div>
I met really great people at all conferences. I don't think other industries have the great community as PyData (and also Python) does. I didn't see anyone trying to sell their product, but it was more about sharing, and getting to know what others do. I really like that.<br />
<br />
I'm not sure if it's just my perception, but I think in London the breaks (breakfast, lunch...) were much shorter. I think London was the conference with a higher number of proposals among the 3, so they tried to accommodate the maximum number of talks, but I personally would prefer to have more time for networking, even if that means few less talks.<br />
<br />
<b>Keynotes</b><br />
<b><br /></b>Good keynotes in general. Of course no every PyData is lucky enough to have a keynote from Travis Oliphant, or WesMcKinney, but the level was quite good.<br />
<br />
There were just a couple of things I couldn't understand (neither the people I talked to about):<br />
<br />
<ul>
<li>In Madrid, Jaime (a numpy core developer) talk had to be a keynote. Even if there were already too of high level Christine and Francesc, I think people need to know that a talk from Jaime (an amazing one btw), is not the same as the one I did.</li>
<li>In London, the opposite, I couldn't see why Tetiana talk was a keynote. I won't say that the talk was bad, it was all right, but not at the level of Travis or Andreas for sure, and IMO it had to be a normal talk, and there had to be other talks at the same time as her talk</li>
</ul>
</div>
<div>
<b>Talks</b></div>
<div>
<b><br /></b></div>
<div>
Very good level. Of course there are some talks better than others, but in general I was quite happy with most of them.</div>
<div>
<br /></div>
<div>
As they are (or will be) in youtube, here you have the ones I liked more:</div>
<div>
<ul>
<li><a href="https://www.youtube.com/watch?v=-aFTKM3nmZo">Travis Oliphant - KEYNOTE: Scaling Out PyData</a></li>
<li><a href="https://www.youtube.com/watch?v=BXID4teFfDc">Andreas Freise - KEYNOTE: Laser ranging in a new dimension</a></li>
<li><a href="https://www.youtube.com/watch?v=fli-yE5grtY">Linda Uruchurtu - Survival Analysis in Python and R</a></li>
<li><a href="https://www.youtube.com/watch?v=jgqTofYLMHM">Or Weizman - A B Testing: Harder than just a color change</a></li>
<li><a href="https://www.youtube.com/watch?v=By8xlYOCwws&list=PLGVZCDnMOq0o43_3tHLAblfdOWwFFg76T&index=10">Francesc Alted - New Computer Trends and How This Affects Us</a></li>
<li><a href="https://www.youtube.com/watch?v=o0EacbIbf58&list=PLGVZCDnMOq0o43_3tHLAblfdOWwFFg76T&index=11">Jaime Fernández - The Future of NumPy Indexing.</a></li>
<li><a href="https://www.youtube.com/watch?v=uju4RXEniA8&list=PLGVZCDnMOq0o43_3tHLAblfdOWwFFg76T&index=14">Ricardo Pio Monti - Modelling a text corpus using Deep Boltzmann Machines</a></li>
<li><a href="https://www.youtube.com/watch?v=BiYTLb-o1Dk&list=PLGVZCDnMOq0rzDLHi5WxWmN5vueHU5Ar7&index=2">Vincent Warmerdam - The Duct Tape of Heroes: Bayes Rule</a></li>
<li><a href="https://www.youtube.com/watch?v=GFcFNccbDM8&list=PLGVZCDnMOq0rzDLHi5WxWmN5vueHU5Ar7&index=25">Dennis Bohle, Ben Teeuwen - Realtime Bayesian A-B testing with Spark Streaming</a></li>
</ul>
<div>
Of course this list is missing many amazing talks, but those are among the ones I remember more that I liked them.</div>
</div>
<div>
<br /></div>
<div>
<b>Lighning talks</b></div>
<div>
<br /></div>
<div>
To me, lightning talks are probably the best of a conference. I really like that in Madrid they had lightning talks both on Saturday and Sunday.</div>
<div>
<br /></div>
<div>
And for me, it was a mistake to have the lightning talks on Sunday both in Amsterdam and London. First, because people from abroad usually have to miss the end of the conference. And also, because it's great for networking to see all the lightning talks on Saturday, and be able to talk to the speakers on Sunday if you share the same interest.</div>
<div>
<br /></div>
<div>
So, IMO, at the end of both days is the best, on Saturday if just one of the days.</div>
<div>
<br /></div>
<div>
<b>Unconference presentations</b></div>
<div>
<b><br /></b></div>
<div>
This is very biased by my personal experience, but I think the unconference presentation format was a failure. For what I could see it worked well for the workshop Vincent gave, because he was one of the speakers, and could tell about his workshop to a large audience. But for the rest, I don't think the majority of the attendees knew about that was in that slot.</div>
<div>
<br /></div>
<div>
To my talk about machine learning for digital advertising, just 4 people attended. I want to believe, that if the title of the presentation was on the schedule, many more people would have attended. So, in my opinion, if unconference presentations are present in future conferences, the online schedule should be updated, and a (big) board with what is going on in that track, should be present.</div>
<div>
<br /></div>
<div>
<b>Food</b><br />
<b><br /></b>Comparing the three conferences, I think the food was much better in Amsterdam than in Madrid or London. In Madrid they got special meals for people who requested them (vegetarian, allergies...), I don't know in the other conferences. It's difficult to say if it's better to spend more money in better food, of course people like better food, but also cheaper tickets, and higher contributions to free software projects.<br />
<br />
What I could see is that more people decided to go to restaurants in Madrid and London than in Amsterdam. Ok, in Amsterdam there weren't any restaurants around, but I think better food is better for networking. The best is probably to find a good sponsor that pays for nice food, but that looks tricky. So, I think all options are all right.<br />
<br />
<b>Conclusion</b><br />
<b><br /></b>
The whole experince of PyData 2016 it's been amazing. Exhausting (specially the ones I had to take flights to go), but amazing, and really worth.<br />
<br />
The organizers have done an amazing job, the local communities, and for what I could see and hear, the ones from NumFOCUS.<br />
<br />
Now I have a beautiful laptop full of stickers, and several PyData T-shirts.<br />
<br />
There are few minor things that <b>in my opinion</b> could be improved, to make the conference even better:<br />
<br />
<ul>
<li>More time for networking</li>
<li>More communication from the organizers (telling all the time what is going on, sign up for lightning talks, unconferences, problems with the wifi, beers planed, community announcements, and even the smaller things)</li>
<li>More lightning talks</li>
<li>Labelling as keynotes the talks that really make a difference</li>
</ul>
<div>
Thank you very much to all the people that made them possible, and see you again there next year!</div>
</div>
Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com3tag:blogger.com,1999:blog-4345066302652424538.post-3014708367446568132015-12-23T19:38:00.000+01:002016-01-20T00:26:30.021+01:00After Fedora installation tasksWhat do I do after installing Fedora 23 MATE-Compiz?<br />
<br />
<ul>
<li>Install Google Chrome</li>
<li>Merge both panels to the bottom, and auto-hide it</li>
<li>Change mouse setup to allow touchpad click and double finger scroll</li>
<li>Change look and feel setup to select window when the mouse moves over it</li>
<li>Disable screensaver</li>
<li>Change terminal shortcuts</li>
<li>sudo dnf update</li>
<li>sudo dnf groupinstall "Development Tools"</li>
<li>sudo rpm -ivh http://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-stable.noarch.rpm</li>
<li>sudo dnf install vim-enhanced git vlc gimp inkscape unzip</li>
<li>Install Anaconda</li>
<li>Copy my settings files: .vimrc .gitconfig .ssh</li>
<li>Add aliases to .bashrc:</li>
<ul>
<li>alias vi="vim"</li>
<li>alias rgrep="grep -R"</li>
</ul>
<li>In Power Management, set up the computer to blank screen when laptop lid is closed</li>
<ul>
</ul>
</ul>
Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com0tag:blogger.com,1999:blog-4345066302652424538.post-45387170090334847892015-12-22T14:07:00.001+01:002017-06-26T19:30:04.050+02:00Jupyter environment setupThis is a short note about how I set up my "data scientist" environment. Different people have different tastes, but what I use, and what I set up is:<br />
<br />
<ul>
<li><b>conda</b> for environment and package management (equivalent to virtualenv and pip to say)</li>
<li>Latest <b>Python</b> (yes, Python 3)</li>
<li><b>Jupyter</b> (aka IPython notebook)</li>
<li>Disable all the autocomplete quotes and brackets stuff, that comes by default with Jupyter</li>
<li>Set the IPython backend for matplotlib</li>
</ul>
<div>
So, we download Anaconda from: https://www.continuum.io/downloads (Linux 64 bits, Python 3, in my case). We install it by:</div>
<div>
<br /></div>
<blockquote class="tr_bq">
bash Anaconda3-2.4.1-Linux-x86_64.sh</blockquote>
<div>
<br /></div>
<div>
We can either restart the terminal, or type the next command, so we start using conda environment:</div>
<div>
<br /></div>
<blockquote class="tr_bq">
. ~/.bashrc</blockquote>
<div>
<br /></div>
<div>
We can update conda and all packages:</div>
<div>
<br /></div>
<blockquote class="tr_bq">
conda update conda && conda update --all</blockquote>
<div>
<br /></div>
<div>
Then we create a new conda environment (this way we can change package versions without affecting the main conda packages). We name it myenv and specify the packages we want (numpy, pandas...).</div>
<div>
<br /></div>
<blockquote class="tr_bq">
conda create --name myenv jupyter numpy scipy pandas matplotlib scikit-learn bokeh</blockquote>
<div>
<br /></div>
<div>
We activate the new environment:</div>
<div>
<br /></div>
<blockquote class="tr_bq">
source activate myenv</blockquote>
<div>
<br /></div>
<div>
Now we have everything we wanted installed, let's change the configuration.</div>
<div>
<br /></div>
<div>
We start by creating a default ipython profile.</div>
<div>
<br /></div>
<blockquote class="tr_bq">
ipython profile create</blockquote>
<div>
<br /></div>
<div>
Then we edit the file ~/.ipython/profile_default/ipython_kernel_config.py and we add the next lines to make matplotlib display the images with the inline backend, and with a decent size:</div>
<div>
<br /></div>
<blockquote class="tr_bq">
c.InteractiveShellApp.matplotlib = 'inline'
c.InlineBackend.rc = {'font.size': 10, 'figure.figsize': (18., 9.), 'figure.facecolor': 'white', 'savefig.dpi': 72, 'figure.subplot.bottom': 0.125, 'figure.edgecolor': 'white'}
</blockquote>
<br />
<div>
<br /></div>
<div>
To disable autoclosing brackets, run in a notebook:</div>
<div>
<br /></div>
<pre style="background-color: #f6f8fa; border-radius: 3px; box-sizing: border-box; color: #24292e; font-family: SFMono-Regular, Consolas, "Liberation Mono", Menlo, Courier, monospace; font-size: 11.9px; font-stretch: normal; line-height: 1.45; overflow: auto; padding: 16px; word-break: normal; word-wrap: normal;"><span class="pl-k" style="box-sizing: border-box; color: #d73a49;">from</span> notebook.services.config <span class="pl-k" style="box-sizing: border-box; color: #d73a49;">import</span> ConfigManager
c <span class="pl-k" style="box-sizing: border-box; color: #d73a49;">=</span> ConfigManager()
c.update(<span class="pl-s" style="box-sizing: border-box; color: #032f62;"><span class="pl-pds" style="box-sizing: border-box;">'</span>notebook<span class="pl-pds" style="box-sizing: border-box;">'</span></span>, {<span class="pl-s" style="box-sizing: border-box; color: #032f62;"><span class="pl-pds" style="box-sizing: border-box;">"</span>CodeCell<span class="pl-pds" style="box-sizing: border-box;">"</span></span>: {<span class="pl-s" style="box-sizing: border-box; color: #032f62;"><span class="pl-pds" style="box-sizing: border-box;">"</span>cm_config<span class="pl-pds" style="box-sizing: border-box;">"</span></span>: {<span class="pl-s" style="box-sizing: border-box; color: #032f62;"><span class="pl-pds" style="box-sizing: border-box;">"</span>autoCloseBrackets<span class="pl-pds" style="box-sizing: border-box;">"</span></span>: <span class="pl-c1" style="box-sizing: border-box; color: #005cc5;">False</span>}}})</pre>
<br />
<div>
<br /></div>
Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com3tag:blogger.com,1999:blog-4345066302652424538.post-44502225753036808552015-01-19T01:58:00.001+01:002015-01-19T03:05:02.292+01:00Google Earth on Fedora<p>Installing Google Earth in Fedora is trickier than it should. Here is a short HOWTO:</p>
<ul>
<li>Download 64bits Fedora version from <a href="https://www.google.com/earth/download/ge/agree.html">Google Earth site</a></li>
<li>sudo yum install google-earth-stable_current_x86_64.rpm</li>
<li>OOOPS!!! You got <b>file /usr/bin from install of google-earth-stable-7.1.2.2041-0.x86_64 conflicts with file from package filesystem-3.2-28.fc21.x86_64</b></li>
</ul>
<p>rpm has an error, we need to fix it. We'll rebuild the rpm fixing the error with <b>rpmrebuild</b></p>
<ul>
<li>sudo yum install rpmrebuild</li>
<li>rpmrebuild -ep google-earth-stable_current_x86_64.rpm</li>
<li>A text editor with the spec file (rpm configuration file) is opened, you need to delete the line <b>%dir %attr(0755, root, root) "/usr/bin"</b></li>
<li>rpmrebuild will ask for confirmation and inform about the path of the generated rpm, just install it</li>
<li>sudo yum localinstall ~/rpmbuild/RPMS/x86_64/google-earth-stable-7.1.2.2041-0.x86_64.rpm</li>
</ul>
<p>Now, the application is succesfully installed, but sometimes crashes when started. It looks like the best to it is to install the 32 bits verion, or Google Earth 6 (latest is 7 at the time of writing this post). Unless you need any specific feature from version 7 I recommend installing version 6 rather than the 32 bits version of 7. The latter requires many dependencies, and it's still buggy on Fedora.</p>
<p>More info:</p>
<ul>
<li><a href="https://code.google.com/p/earth-issues/issues/detail?id=1525">https://code.google.com/p/earth-issues/issues/detail?id=1525</a></li>
</ul>Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com0tag:blogger.com,1999:blog-4345066302652424538.post-34889297844531705042015-01-18T19:33:00.001+01:002015-01-19T01:30:11.490+01:00Skype on Fedora 21<p>Here there is a blog post on how to install Skype on Fedora 21 by quickly creating an RPM package.</p>
<p>
<a href="http://mariuszs.github.io/blog/2014/skype_for_fedora_21.html">http://mariuszs.github.io/blog/2014/skype_for_fedora_21.html</a>
</p>
<p>It's also a great simplified tutorial on how to build any RPM.</p>Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com0tag:blogger.com,1999:blog-4345066302652424538.post-49427019381887246052014-12-02T03:39:00.002+01:002014-12-02T03:39:40.273+01:00LATEX awesomenessI think LATEX is simply amazing by itself. More when writing an academic document, but for any kind of doc, using LATEX is really time and pain saving.<br />
<br />
The concept of creating a document class (defining all the styles of the document), and then simply forgetting on formats, and focusing on content is just amazing. Also the automatic management of references (bibliography), the automatic management of figure and table labels, and and the automatic creation of the table of contents. Not to mention how nice is creating formulas.<br />
<br />
But besides LATEX itself, what I found really cool is <a href="http://www.sharelatex.com/">www.sharelatex.com</a>. I've been using it for a while, and after discovering some new features (new for me, not sure how long they've been there), I found it was the perfect editor for LATEX.<br />
<br />
First advantage is the cloud usual stuff, no local installation needed, backups are managed by the service provider, accessible from different devices...<br />
<br />
But there are some other specific to the site:<br />
<br />
<ul>
<li>Collaborative environment: see what others are writing in real time, conflict avoidance, always up-to-date, and even author chat</li>
<li>Version control system: Don't loose any version, history of changes is kept.</li>
<li>Ownership of your project: download a zip of all your files, sync to Dropbox, and also to Github (beta).</li>
<li>Extra features:</li>
<ul>
<li>Spell check</li>
<li>Autocomplete</li>
<li>And last but not least: VIM keybindings!!!</li>
</ul>
</ul>
<div>
Few of the features are for premium accounts only (+10 collaborators, Dropbox sync, full history access), but the free plan I'm using is exactly what I need so far.</div>
Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com0tag:blogger.com,1999:blog-4345066302652424538.post-11082292982676229892012-07-10T16:58:00.003+02:002014-11-19T23:25:38.854+01:00Brother printer on GNU/LinuxFor some reason, brother printers (at least mine) do not take into account the settings specified for the printer in the regular way (Gnome settings in my case).<br />
<br />
But mysteriously, there is a command which can be used to change them properly (<span style="background-color: white;">brprintconf_mfc235c)</span><span style="background-color: white;">. In my case, I was having problems with top margin, and top of pages was not printed. Apparently it was because of page type, so I could fix it by:</span><br />
<br />
<code>
sudo brprintconf_mfc235c -pt A4
</code>Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com2tag:blogger.com,1999:blog-4345066302652424538.post-45517904790268352072012-04-27T20:10:00.002+02:002014-11-19T23:32:15.320+01:00Fixing Gnome 3 design mistakes<br />
While there are some cool things in Gnome 3, I think mostly everyone will agree that there are many design mistakes. To me, it looks like a bad copy of the Mac desktop, and it's specially annoying that they also brought the "let Steve Jobs decide it for you" philosophy to GNU. I don't really know who is leading the project, but looks like they should send a resume to Apple.<br />
<br />
Anyway, the reason for this post is that I realized that I'm not alone. And I realized in a strange way... Basically, it looks to be a consensus on which are the parts of Gnome 3 which suck more, and I arrived to this conclusion after seeing that there is a Fedora package to remove almost every non-sense feature Gnome brings.<br />
<br />
These include (among others):<br />
<br />
<ul>
<li>Accessibility icon always on the bar</li>
<li>Alt-tab bothering system of grouping windows</li>
<li>Power off option not hidden by secret methods</li>
</ul>
<br />
<br />
I'm not a genius, so I don't usually share my opinions in Linus Torvalds style, but I think it's totally worth in this case.<br />
<br />
So, if you're using Gnome 3, you should probably check this link out:<br />
<br />
<a href="http://fedoraproject.org/wiki/Features/GnomeShellConfigurability">http://fedoraproject.org/wiki/Features/GnomeShellConfigurability</a>Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com1tag:blogger.com,1999:blog-4345066302652424538.post-34545452576621368322011-06-27T13:00:00.001+02:002011-08-09T17:46:20.000+02:00Create user and database in PostgresWhile I love Postgres, I get some problems every time I want to do the simple operation of creating a database with an associated user if it's been a while since the last time I did it.<br />
<br />
There are several posts on the Internet about Postgres authentication, but I couldn't find any explaining exactly what I wanted to know, so here is mine.<br />
<br />
This has been tested on <b>Debian 6</b> and <b>PostgreSQL 8.4</b>.<br />
<br />
1. Install the PostgreSQL server (obvious)<br />
2. Create the user:<br />
<code><br />
$ sudo -u postgres createuser -D -A -P <my-user><br />
</code><br />
3. Create the database<br />
<code><br />
$ sudo -u postgres createdb -O <my-user> <my-database><br />
</code><br />
4. Edit /etc/postgresql/8.4/main/pg_hba.conf<br />
<code><br />
# Put your actual configuration here<br />
local all all password<br />
host all all 127.0.0.1/32 password<br />
</code><br />
<b>NOTE:</b> Make sure that your settings are placed after the comment saying where your configurations go. If you place them at the end, the default ones will be used, and you'll see this error when logging in:<br />
<code><br />
psql: FATAL: Ident authentication failed for user "<my-user>"<br />
</code><br />
<br />
Actually, you'll probably want to customize the settings you want to use. My settings allow logging in from localhost using unencrypted password, but may be you want to access from another host, only grant access to some users or some databases, or use another authentication methods, so I would recommend you reading the <a href="http://developer.postgresql.org/pgdocs/postgres/auth-pg-hba-conf.html">pg_hda.conf reference</a>.<br />
<br />
Finally, you'll be able to access by:<br />
<code><br />
$ psql -U <my-user> -W<br />
</code>Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com1tag:blogger.com,1999:blog-4345066302652424538.post-11960011596725788962011-06-25T01:19:00.002+02:002011-06-25T01:31:10.516+02:00Unified PythonAfter all these days at EuroPython, there is a thought that keep me thinking. It is about how Python have different ways to represent what it could be considered the same thing.<br />
<br />
On today's talk, Alex Martelli pointed out that "def" and "lambda" are actually the same concept. This was part of a more complete idea about that both of them have the wrong name ("function" should be the right), and that lambda actually should disappear, but that's another question.<br />
<br />
Also, yesterday, Raymond Hettinger reminded that class are actually dictionaries, something that most Pythonistas know, but which also made me thought.<br />
<br />
Then, there is something that I never saw very clearly, and it is the subtle difference between an instance and a dictionary, and how trivial it can be in some case, the difference between person['name'] and person.name.<br />
<br />
So, I wanted to do an experiment on how it could look Python, if it would try to merge all this entities in ones single format, and even some other things like avoiding assignments that doesn't follow the assignment pattern (I mean class or function definition here, where instead of <i>my_func = [...]</i> it's used <i>def my_func[...]</i>).<br />
<br />
Next, there is how the most stupid example I could invent looks like, but first some definitions to make it easier to understand the idea.<br />
<br />
<b>map</b>: could be also "class", "dict", "obj", "hash",... and it's the structure for dictionaries, classes and instances.<br />
<b>seq</b>: a list or tuple, any linear sequence of values.<br />
<b>func</b>: a function or callable, that in Python is defined by "def" or "lambda".<br />
<br />
<code><br />
foods = seq:<br />
"meat"<br />
"milk"<br />
"bread"<br />
<br />
sounds = map:<br />
"bark" = "woof woof"<br />
"mew" = "meow meow"<br />
<br />
animal = map:<br />
"step_size" = None<br />
"sound" = None<br />
<br />
"move" = func(self, num_steps):<br />
print("I've moved {} units".format(num_steps * self.step_size))<br />
<br />
"talk" = func(self):<br />
print(sounds.{self.sound})<br />
<br />
"eat" = func(self, food):<br />
print("I'm eating {}".format(food))<br />
<br />
cat = map(animal):<br />
"step_size" = 80<br />
"sound" = "mew"<br />
<br />
"eat" = func(self, food):<br />
print("I only eat {} if I want to".format(food))<br />
<br />
<br />
azrael = map(cat):<br />
"owner_name" = "Gargamel"<br />
<br />
azrael.move(5)<br />
for food in foods:<br />
azrael.eat(food)<br />
</code><br />
<br />
Of course, there are too many things that should be considered before being able to implement this syntax, but can give an idea on how it could look a more <i>unified</i> approach of Python syntax.<br />
<br />
See how the syntax for "sounds", which would be a dictionary, "cat", which would be a class, and "azrael", which would be a instance, is exactly the same.<br />
<br />
Being used to Python syntax, it's difficult to say if this syntax could be readable, so far I just find it weird. But what looks clear, is that this syntax would make the language simpler, from the implementation point of view, and probably from the programmer point of view, who would probably need to forget some OP concepts first.<br />
<br />
Whatever is the conclusion the reader can get from this example, I think it's quite interesting seeing how a class can look exactly the same way as a dictionary, and how an instance can look exactly as a subclass of the base class.Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com4tag:blogger.com,1999:blog-4345066302652424538.post-37308926460318846102011-06-11T20:48:00.007+02:002011-06-12T01:04:40.925+02:00Building RPMs for Python3.1While it's been a long time since the first stable version Python 3 was released, it's not yet available on several operating systems. Looking for a repository with Python 3 rpms, I found <a href="http://iuscommunity.org/Repos">IUS Community</a>, but I had some problems with it, and I thought on building my own rpms.<br />
<br />
The process for building an rpm from a source tarball is pretty easy (if you know the steps). The only problem in this case, is that the .spec file delivered with Python is not updated, so the process fails.<br />
<br />
I did required changes to the .spec file, and I uploaded it to: <a href="http://files.vaig.be/python-3.1.spec">http://files.vaig.be/python-3.1.spec</a> (NOTE, that is necessary to edit the exact version of Python you're building in line 37. Version in uploaded file is 3.1.3, but it could be changes to 3.1.3, 3.1.4rc1,...).<br />
<br />
Next, you can find the steps for creating a RPM package for Python 3.1 on a CentOS 5 (using my custom .spec file):<br />
<br />
<code><br />
# Install required software<br />
yum install rpm-build gcc expat-devel db4-devel gdbm-devel sqlite-devel ncurses-devel readline-devel zlib-devel openssl-devel<br />
<br />
# Download Python source<br />
cd /usr/src/redhat/SOURCES/<br />
wget http://www.python.org/ftp/python/3.1.3/Python-3.1.3.tar.bz2<br />
<br />
# Download .spec (rpm specifications file)<br />
cd /usr/src/redhat/SPECS/<br />
wget http://files.vaig.be/python-3.1.spec<br />
<br />
# Generate RPMs (and SRPMs)<br />
rpmbuild -ba /usr/src/redhat/SPECS/python-3.1.spec<br />
</code><br />
<br />
Compiling Python and creating the RPM will take a while, but after this process, you'll have the RPMs at:<br />
<br />
<code><br />
/usr/src/redhat/SRPMS/python3.1-3.1.3-1pydotorg.src.rpm<br />
/usr/src/redhat/RPMS/<YOUR-ARCH>/python3.1-3.1.3-1pydotorg.i386.rpm<br />
/usr/src/redhat/RPMS/<YOUR-ARCH>/python3.1-devel-3.1.3-1pydotorg.i386.rpm<br />
/usr/src/redhat/RPMS/<YOUR-ARCH>/python3.1-tools-3.1.3-1pydotorg.i386.rpm<br />
</code>Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com0tag:blogger.com,1999:blog-4345066302652424538.post-40316026691621568242010-12-26T14:35:00.004+01:002010-12-27T04:14:30.005+01:00Joel test for software companies<div><blockquote></blockquote><blockquote></blockquote><blockquote></blockquote><blockquote></blockquote>Today I discovered Joel test, a test to evaluate software companies. While the article is pretty out-of-date, and there are some points that are exclusively for companies working on compiled programming languages, the article is still very interesting.</div><div><br /></div><div>I think every software company should took the test. Also I think it can be useful for programmers, to evaluate if a company is a good place to work.</div><div><br /></div><div>The original article is at:</div><a href="http://www.joelonsoftware.com/articles/fog0000000043.html">http://www.joelonsoftware.com/articles/fog0000000043.html</a><br /><br /><div>Let me provide a summary of original questions here:</div><div><blockquote></blockquote><ul><li>Do you use source control?</li><li>Can you make a build in one step?</li><li>Do you make daily builds?</li><li>Do you have a bug database?</li><li>Do you fix bugs before writing new code?</li><li>Do you have an up-to-date schedule?</li><li>Do you have a spec?</li><li>Do programmers have a quiet working conditions?</li><li>Do you use the best tools money can buy?</li><li>Do you have testers?</li><li>Do new candidates write code during their interview?</li><li>Do you do hallway usability testing?</li></ul></div><div>My personal update to the questions would be:</div><div><ul><li>Do you use a distributed source control system?</li><li>Do you use a bug database where users can report bugs directly?</li><li>Do you have a testing protocol, and specific resources for testing?</li><li>Do you fix bugs before implementing new features?</li><li>Do you have automated build or deployment procedures?</li><li>Do you have a roadmap, and you don't make important changes to the short term priorities?</li><li>Does your team work in good conditions (quiet environment, flexible schedule, freedom to choose development software, fair paycheck...)</li></ul><div>I think those questions can give you an idea on how efficient your company is, and indirectly, about the quality of your software.</div></div>Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com68tag:blogger.com,1999:blog-4345066302652424538.post-86492235243358331602010-12-25T08:53:00.005+01:002011-06-11T23:44:31.906+02:00Branching with MercurialThis is a simple guide on how to do simple branching operations in Mercurial.<br />
<br />
First of all, let's comment the two different options for branching on Mercurial, and in most distributes source control systems.<br />
<br />
First option is to create a clone of the original repository to create a branch. This option can be simpler for the user, which has different directories for every branch, and there are not special operations to switch from one branch to another. Another advantage is that branches can be safely switched when some changes are not yet commited, as every branch is in a different directory.<br />
<br />
The second option would be to use Mercurial branching commands, and to keep all branches in the same repository. The main advantage of doing this, is that branches can be distributed when using push and pull operations. This is very important, if different programmers need to work with different branches, or if you want to replicate all branches when synchronizing your code with for example <a href="http://code.google.com/">Google code</a> or <a href="https://bitbucket.org/">Bitbucket</a>.<br />
<br />
This second options is the one I'll cover in this simple guide.<br />
<br />
Imagine we have a started repository, with some code, and we never used branching before.<br />
<br />
If we perform:<br />
<br />
<code><br />
# hg branches<br />
default 69:3f5490390a0b<br />
</code><br />
<br />
we can see that we already have a branch named default, that is the one that has all our changesets and code.<br />
<br />
Now we want to start working on a new version of our application, but we want to be able to do bugfixing to the application we already deployed. We can do it, creating a new branch on our repository.<br />
<br />
Look at this example that creates a new branch named newversion, and we create a new file named newfile on it.<br />
<br />
<code><br />
# hg branch newversion<br />
# touch newfile<br />
# hg add newfile<br />
# hg commit -m "commit on the new branch newversion"<br />
</code><br />
<br />
After this, if we check for the branches again, we'll have this:<br />
<br />
<code><br />
# hg branches<br />
newversion 70:720062b481d7<br />
default 69:3f5490390a0b (inactive)<br />
</code><br />
<br />
After working on the new branch, we find a bug on the deployed version, and we want to fix it on the version that is deployed. So we have to switch to the default branch to see the content of this branch in our repository directory. We can get it by simply typing:<br />
<br />
<code><br />
# hg update default<br />
# > bugfixedfile<br />
# hg add bugfixedfile<br />
# hg commit -m "bug fixed in default branch"<br />
</code><br />
<br />
NOTE, that we shouldn't have files that are not under the control version system, and that are specific to a branch in the code, as Mercurial will keep those files on the new branch after switching. We can use the option -C if the files are temporary and we want to clear them.<br />
<br />
To know the current branch, we can use branch command with no parameters:<br />
<br />
<code><br />
# hg branch<br />
default<br />
</code><br />
<br />
After fixing the bug in the default branch, we'll probably want to fix it in the new version too, so we'll proceed by:<br />
<br />
<code><br />
# hg update newversion<br />
# hg merge default<br />
# hg commit -m "merged from branch default"<br />
</code><br />
<br />
After the new version is ready to be deployed, we'll probably want to merge it back to default, so default will go on being the deployed version. It's recommended to have all changes to the default branch merged to the new version branch, before merging it back to default.<br />
<br />
<code><br />
# hg update newversion<br />
# hg merge default<br />
# hg commit -m "merged from branch default"<br />
<br />
# hg update default<br />
# hg merge newversion<br />
# hg commit -m "merged branch newversion into default"<br />
</code><br />
<br />
Finally, the last thing we would want to do is to close the branch where we developed the new version, as further changes to this version will be made to the default branch. It's as simple as:<br />
<br />
<code><br />
# hg update newversion<br />
# hg commit --close-branch -m "closing branch newversion after being merged to default"<br />
</code>Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com0tag:blogger.com,1999:blog-4345066302652424538.post-28345324359980336662010-12-05T03:41:00.003+01:002011-06-11T23:45:57.873+02:00Two simple steps to reduce bandwidth on static filesFirst step is to let Google host your JavaScript library of choice for you. Google Libraries API hosts JQuery, Mootools, Prototype... and you can directly link to them from your website.<br />
<br />
More info at:<br />
<a href="http://code.google.com/apis/libraries/devguide.html">http://code.google.com/apis/libraries/devguide.html</a><br />
<br />
<div>Second step is to compress you CSS file (or files, but if you are gonna compress it to save bandwidth, probably you want to merge them in one for better performance). There are several websites which compress CSS files online, and for free. The one I found which works best is:<br />
<a href="http://www.lotterypost.com/css-compress.aspx">http://www.lotterypost.com/css-compress.aspx</a></div>Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com2tag:blogger.com,1999:blog-4345066302652424538.post-89677528238867369232010-12-04T23:48:00.006+01:002011-06-11T23:49:43.015+02:00Debugging with PDB and App EnginePython debugger (pdb) doesn't work on App Engine SDK as usual. After adding to my project:<br />
<br />
<code><br />
import pdb; pdb.set_trace()<br />
</code><br />
<br />
I got:<br />
<br />
<code><br />
Blocking access to skipped file "<my_path>/.pdbrc"<br />
<br />
File "/usr/lib/python2.6/bdb.py", line 46, in trace_dispatch<br />
return self.dispatch_line(frame)<br />
File "/usr/lib/python2.6/bdb.py", line 65, in dispatch_line<br />
if self.quitting: raise BdbQuit<br />
</code><br />
<br />
But, as posted in <a href="http://morethanseven.net/2009/02/07/pdb-and-appengine.html">morethanseven</a>, it's easy to make it work using:<br />
<br />
<code><br />
import pdb <br />
import sys <br />
for attr in ('stdin', 'stdout', 'stderr'):<br />
setattr(sys, attr, getattr(sys, '__%s__' % attr))<br />
pdb.set_trace()<br />
</code>Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com0tag:blogger.com,1999:blog-4345066302652424538.post-1187716764527966422010-01-01T20:46:00.003+01:002010-01-01T21:22:38.261+01:00Linux and Debian simple boot<div>Today I've been researching on Linux and Debian booting.</div><div><br /></div><div>There is an excellent article from IBM, which explains the procedure, and the involved parts:</div><div><br /></div><a href="http://www.ibm.com/developerworks/linux/library/l-linuxboot/">http://www.ibm.com/developerworks/linux/library/l-linuxboot/</a><div><br /></div><div>Basically:</div><div><ul><li>BIOS checks CMOS and choose the booting device</li><li>Control is given to device's MBR (physically first 512 bytes)</li><li>MBR checks for partitions on the device (in a self contained table), and gives control to bootable partition.</li><li>Then, Grub, LILO or whatever takes control, to load the kernel, and the file system.</li><li>Usually a <a href="http://en.wikipedia.org/wiki/Initrd">initrd</a> filesystem is loaded before the "real" one. This way, the kernel can access this filesystem, while the modules for loading the one in the root partition are not yet loaded.</li><li>Finally, init program is called, to load all <a href="http://en.wikipedia.org/wiki/User_space">user-space</a> applications.</li></ul><div>To set this up in a USB drive (my idea), in a simple way, we need:</div><div><br /></div><div>Make the drive bootable, using the <a href="http://en.wikipedia.org/wiki/Syslinux">syslinux</a> tool, which is used for FAT filesystems:</div><div>syslinux /dev/sdb (or whatever device you want)</div><div><br /></div><div>Then, mount the filesystem, and copy:</div><div>linux: the linux kernel binary</div><div>initrd.gz: compressed cpio file containing the initrd file tree</div><div><br /></div><div>and</div><div><br /></div><div>syslinux.cfg: syslinux settings, to let syslinux know where to find the kernel and the initrd. Basically:</div><div><br /></div><div>default linux</div><div>append initrd=initrd.gz</div><div><br /></div><div>Then, just restart, and your device will boot your kernel, and your filesystem.</div><div><br /></div><div>Here, you can find a Linux kernel, and a initrd file, which will load a basic linux system, running the Debian installer:</div><div><br /></div><div><a href="http://ftp.debian.org/debian/dists/stable/main/installer-i386/current/images/netboot/debian-installer/i386/">http://ftp.debian.org/debian/dists/stable/main/installer-i386/current/images/netboot/debian-installer/i386/</a></div><div><br /></div><div>Some more info on it at:</div><div><br /></div><div><a href="http://www.debian.org/releases/stable/i386/ch04s03.html.en">http://www.debian.org/releases/stable/i386/ch04s03.html.en</a></div><div><br /></div></div>Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com0tag:blogger.com,1999:blog-4345066302652424538.post-72718921695719095732009-12-22T22:52:00.002+01:002009-12-22T23:09:46.651+01:00New localization system already in trunkJust few hours ago, Django's new localization system has been commited to trunk.<div><br /></div><div>As some of you know, I did most of the work as my Google Summer of Code project, this year. Of course, together with <a href="http://jannisleidel.com/">Jannis Leidel</a>, who also did the final steps, including the commit.</div><div><br /></div><div>Summarizing, with this change Django will format all displayed data, according to user's current locale. For example, the calendar will display Sunday as the first day for users in the States, but Monday for users from most European countries. Also it'll format numbers and dates.</div><div><br /></div><div>You can check the slides I presented at DjangoCon.</div><div><a href="http://docs.google.com/present/view?id=dfbzs3ks_16d26xjbd9">http://docs.google.com/present/view?id=dfbzs3ks_16d26xjbd9</a></div><div><br /></div><div>Note that the setting is no longer USE_FORMAT_I18N (as in the slides), but USE_L10N.</div><div><br /></div><div>You can also check the commit at:</div><div><a href="http://code.djangoproject.com/changeset/11964">http://code.djangoproject.com/changeset/11964</a></div><div><br /></div><div><br /></div>Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com7tag:blogger.com,1999:blog-4345066302652424538.post-81307409468991317812009-04-24T19:09:00.004+02:002009-04-24T19:16:32.517+02:00GSoC: Implementation of additional i18n features on Django<p><span><span>Here you have my proposal for Google Summer of Code </span></span><span><span>2009. It was approved previous week, and I'll be working on it during this summer.</span></span><span style="font-size:180%;"><span><span><span><strong><br /></strong></span></span></span></span></p> <h2><span><span><span><strong><span><span><span><strong>The problem<br /></strong></span></span></span></strong></span></span></span></h2><p><span><span>While Django provides an amazing system to translate texts, and displays localized dates in some parts of the admin; it has many data that could be internationalized, not it's not yet.</span></span></p> <p><span><span>The information that developers should be able to localize/translate is mainly:</span></span></p> <ul><li> <p><span><span><span>All dates and related information (times, calendars...)</span></span></span></p> </li><li> <p><span><span><span>All numbers (mainly decimal ones)</span></span></span></p> </li><li> <p><span><span><span>Texts (and any data in general) saved on the database</span></span></span></p> </li></ul> <p><span><span><span>More information on these issues can be found in the following blog post and this ticket:</span></span></span></p> <p><a href="http://vaig.be/2008/07/django-i18n-status.html"><span><span>http://vaig.be/2008/07/django-i18n-status.html</span></span></a></p> <p><a href="http://code.djangoproject.com/ticket/7980"><span><span>http://code.djangoproject.com/ticket/7980</span></span></a></p> <h2><span><span><span><strong><span><span><span><strong>Proposal</strong></span></span></span></strong></span></span></span></h2> <p><span><span>The proposed solution for improving Django i18n includes several different tasks. Those tasks are:</span></span></p> <ul><li> <p><span><span>Import locale data from CLDR</span></span></p> </li><li> <p><span><span>Apply i18n to Django dates and times</span></span></p> </li><li> <p><span><span>Apply i18n to Django numbers</span></span></p> </li><li> <p><span><span>Allow translating content on the database</span></span></p> </li><li><span><span>Fix already reported bug about i18n<br /></span></span></li></ul> <p><span><span>Next are the details for every task. Note that all those specifications are subject to change, according to discussions with the mentor of the project, Django core developers team, and the main Django community.</span></span></p> <h2><span><span><span><strong><span><span><span><strong>Importing locale data<br /></strong></span></span></span></strong></span></span></span></h2> <p><span><span><span>The main repository of locale data is the Common Locale Data Repository (CLDR) by the Unicode Consortium </span></span>http://cldr.unicode.org/.<span><span> It provides a set of XML files with information such as date, time and number formatting for most languages.</span></span></span></p> <p><span><span>The idea of this task would be to create a python script (probably as a django-admin command), that will extract all necessary data from those XML files and put it into configuration files on the Django structure. This information will be used by Django to internationalize data on applications.</span></span></p> <p>The idea of this script is to be used just by Django developers. It would mainly be a one-time execution script, and then it would be executed just when new locales are added (are some are changed).</p> <p><span><span>All information gathered from CLDR files could be saved on django/conf/locale/{ language code }/formats/django.po</span></span></p> <p><span><span>Specific settings imported from CLDR could be (with English localized example):</span></span></p> <ul><li> <p><span><span>SHORT_DATETIME_FORMAT (12-31-2000 11:59 p.m.)</span></span></p> </li><li> <p><span><span>LONG_DATETIME_FORMAT (December 31th 2000, 11:59 p.m.)</span></span></p> </li><li> <p><span><span>SHORT_DATE_FORMAT (12-31-2000)</span></span></p> </li><li> <p><span><span>LONG_DATE_FORMAT (December 31th 2000)</span></span></p> </li><li> <p><span><span>FIRST_DAY_OF_WEEK (0 meaning Sunday)</span></span></p> </li><li> <p><span><span>TIME_FORMAT (11:59 p.m.)</span></span></p> </li><li> <p><span><span>YEAR_MONTH_FORMAT (December of 2000)</span></span></p> </li><li> <p><span><span><span>MONTH_DAY_FORMAT (December 31th)</span></span></span></p> </li></ul> <ul><li> <p><span><span>DECIMAL_NUMBER FORMAT (1,000,000.123)</span></span></p> </li></ul> <p><span><span>There are some locale based parameters that already exist on Django, on translation files (LC_MESSAGES) and could be deprecated on future releases of Django (when breaking backward compatibility). Those are:</span></span></p> <ul><li> <p><span><span>DATETIME_FORMAT</span></span></p> </li><li> <p><span><span>DATE_FORMAT</span></span></p> </li><li> <p><span><span>TIME_FORMAT</span></span></p> </li><li> <p><span><span>YEAR_MONTH_FORMAT</span></span></p> </li><li> <p><span><span>MONTH_DAY_FORMAT</span></span></p> </li></ul> <p>For keeping the system flexible, existing default values on settings will be kept. Probably it would be worth to add new ones for the new customizable formats.</p> <h2><span><span><strong>Dates, times and calendar i18n</strong></span></span></h2> <p><span><span>All dates and times displayed using Django should use the format defined for the current session locale. This is already implemented for some dates, like the ones displayed in admin's lists. Also a filter for formatting dates already exists in templates, which, together with the formats in the translation files, can do the job. But the good way to do that would be displaying the date by default on the session locale.</span></span></p> <p><span><span>All Django forms (including admin forms) should accept the short date/datetime format of the current locale. Now it's possible to define the accepted formats using parameters of the widget, and this can be kept, but at least support for entering data formatted in current locale should be added. ISO and/or English locale can be kept as well. Existing data on input fields should be displayed in current locale too.</span></span></p> <p>As Django 1.0 series is maintaining backward compatibility, those changes have to be implemented being compatible with existing behavior by default.</p> <p><span><span>The calendar on admin's date/datetime field should also be displayed according to user session locale.</span></span></p> <p><span><span>So basically those are the main tasks required for internationalizing Django dates:</span></span></p> <ul><li> <p><span><span>Format all python date/datetimes objects using locale settings when converted to string to be displayed. Basically it means models.DateField and models.DateTimeField values on model instances.</span></span></p> </li><li> <p><span><span>Change input widgets to display data and to allow entering data on the format of the current locale.</span></span></p> </li><li> <p><span><span>Display admin calendar starting weeks on the day defined for current locale.</span></span></p> </li></ul> <p><span><span>With those changes next tickets would be fixed:</span></span></p> <ul><li> <p><span><span><span>#1061 About first day on calendars</span></span></span></p> </li><li> <p><span><span><span>#5526 About accepting non-English formats on input widgets</span></span></span></p> </li><li> <p><span><span><span>#6231 About the output format of the SelectDateWidget</span></span></span></p> </li><li> <p><span><span><span>#6449 About default format of displayed dates</span></span></span></p> </li><li> <p><span><span><span>#6483 About supporting European dates on javascript routines</span></span></span></p> </li><li> <p><span><span><span>#7509 About supporting different formats on SplitDateTimeWidget</span></span></span></p> </li><li> <p><span><span><span>#7656 About inheriting i18n features of AdminDateWidget</span></span></span></p> </li></ul> <h2><span><span><strong>Number i18n</strong></span></span></h2> <p><span><span>Right now, Django doesn't provide anything for localizing numbers on applications. All numeric values within Django applications are formatted using American formats. Users from many countries are not used to dealing with the American format, and a simple shop using Django can create confusion among users who, for example, expect the comma to be the decimal separator, and they find the point on prices.</span></span></p> <p>As for the previous section, changes must be applied keeping backward compatibility.</p> <p><span><span>So Django should display, and use by default the language of the current locale to format numbers. Basically that means:</span></span></p> <ul><li> <p><span><span><span>Format numbers on templates using current session locale</span></span></span></p> </li><li> <p><span><span><span>Display and allow entering data using session locale on input widgets</span></span></span></p> </li></ul> <p><span><span>With those changes next ticket should be fixed:</span></span></p> <ul><li> <p><span><span>#3940 About comma as decimal separator</span></span></p> </li></ul> <h2><span><span><strong>Translating dynamic content</strong></span></span></h2> <p><span><span>Django has an amazing system for translating texts to any language. The only problem of this system is that it can just be used for static content (defined on source files, including templates), and not for dynamic content, created by users after deploying the web site. This can be useful for many different situations like an application that has a product catalog where product names and descriptions have to be translated, or a news website, where news can be translated to any language.</span></span></p> <p><span><span>There are some external applications, widely used, that allow to do that on Django, but all of them have many different problems, like complex and tricky syntax for developers, ugly interface for users, bad design, bad scalability... Main applications are:</span></span></p> <ul><li> <p><span><span><span>django-multilingual</span></span></span></p> </li><li> <p><span><span><span>transdb</span></span></span></p> </li><li> <p><span><span><span>django-transmeta</span></span></span></p> </li><li> <p><span><span><span>django-multilingual-model</span></span></span></p> </li></ul> <p><span><span><span>A comparison of the two first applications, and some ideas for a better solution, can be found on a presentation at</span></span></span></p> <p><a href="http://docs.google.com/Presentation?docid=dfbzs3ks_7f2z85hvr&hl=en"><span><span><span>http://docs.google.com/Presentation?docid=dfbzs3ks_7f2z85hvr&hl=en</span></span></span></a></p> <p><span><span>Basically, a good solution to allow Django developers to translate their models should include:</span></span></p> <ul><li> <p><span><span><span>An easy way to specify translatable fields on models (or outside the models)</span></span></span></p> </li><li> <p><span><span><span>An easy way to allow translating content using the admin or custom forms</span></span></span></p> </li><li> <p><span><span><span>Displaying translated fields in session language by default (allowing to get the value for a specific value)</span></span></span></p> </li><li> <p><span><span><span>A scalable way to save translations on the database</span></span></span></p> </li></ul> <p><span><span><span>To achieve those targets a lot of analysis is required, so, just some ideas are detailed here.<br /></span></span></span></p> <p><span><span><span>For the model syntax there are many different options, some of them can be checked on this blog post, and this poll:</span></span></span></p> <p><a href="http://vaig.be/2009/03/django-multilingual-syntax-poll.html"><span><span>http://vaig.be/2009/03/django-multilingual-syntax-poll.html</span></span></a></p> <p><a href="http://doodle.com/aicvayf8ss2mxm2h"><span><span><span>http://doodle.com/aicvayf8ss2mxm2h</span></span></span></a></p> <p><span><span>The most popular one is (using an example):</span></span></p> <pre><span><span>class MyModel(model.Model):</span></span><br /> <span>my_field = CharField()</span><br /> <span>my_i18n_field = CharField()</span><br /><br /> <span>class Meta:</span><br /> <span>translate = ('my_i18n_field',)</span></pre> <p><span><span>A way to translate models (and whole applications) without modifying its code would be great, in order to translate applications that already exist.</span></span></p> <p>For the database backend there are also different options, including:</p> <ul><li>To create a field on the model for every translation</li><li>To create a related model</li></ul> <p><span><span>There is just one generic ticket on Django that would be fixed:</span></span></p> <ul><li> <p><span><span>#6460 About multilingual content on database</span></span></p> </li></ul> <p>May be it's not possible having a generic solution that fits most of the user-cases, and in that case could be worth making some modifications on Django to make it easier creating external applications that can do this job.</p> <h2>Fix i18n bugs</h2> <p>There are many bugs already accepted on Django trac, that would be fixed on this Summer of Code. A better review will be done, but some of them could be:</p> <ul><li>#3907: LocaleMiddleware allows languages not supported by Django</li><li>#5494: Javascript catalog doesn't check project level locales</li><li>#7050: make-messages should ignore applications with custom localization</li></ul> <h2><span><span><span><strong>Timeline</strong></span></span></span></h2> <p><span><span>The estimated time line for this project, detailed in a weekly basis is:</span></span></p> <ul><li> <p><span><span>Week 01: Analysis and working environment setup<br /></span></span></p> </li><li> <p><span><span>Week 02: </span></span><span><span>Import CLDR</span></span></p> </li><li> <p><span><span>Week 03: Import CLDR</span></span></p> </li><li> <p><span><span>Week 04: </span></span><span><span>I18n of dates and numbers</span></span></p> </li><li> <p><span><span>Week 05: I18n of dates and numbers</span></span></p> </li><li> <p><span><span>Week 06: I18n of dates and numbers</span></span></p> </li><li> <p><span><span>Week 07: </span></span><span><span>Translation of dynamic content</span></span></p> </li><li> <p><span><span>Week 08: Translation of dynamic content</span></span></p> </li><li> <p><span><span>Week 09: Translation of dynamic content</span></span></p> </li><li> <p><span><span>Week 10: Translation of dynamic content</span></span></p> </li><li> <p><span><span>Week 11: Fix i18n bugs</span></span></p> </li><li> <p><span><span>Week 12: Fix i18n bugs</span></span></p> </li></ul> <p><span><span>My dedication to the project will be full time, around 40 hours per week. A total of 480 hours are estimated for the whole project.</span></span></p> <h2><span><span><span><strong>About me</strong></span></span></span></h2> <p><span><span>My name is Marc Garcia, I'm from Barcelona, Europe, and I'm 29 years old.</span></span></p> <p><span><span>I am studying computer science at Universitat Oberta de Catalunya, an Internet-based university from Barcelona. Currently I'm not working but I have almost 8 years of programming experience (with different technologies, mainly Python, PHP and VB).</span></span></p> <p><span><span>I started using Django in 2006, and at this time I developed and participated on the development of many websites, as well as many reusable applications for Django.</span></span></p> <p><span><span>As examples of reusable Django applications note:</span></span></p> <ul><li> <p><span><span>django-stdimage: Saves ImageField files with standard names, allowing to delete them, and creating automatic thumbnails.</span></span></p> </li><li> <p><span><span>Transdb: Allows translating database content</span></span></p> </li><li> <p><span><span>django-transmeta: Also allows translating database content (different approach)</span></span></p> </li><li> <p><span><span>django-cart: Simple cart object to easily add/update/remove products to user session</span></span></p> </li></ul> <p> </p> <p><span><span>As examples of websites, note next ones:</span></span></p> <ul><li> <p><span><span>http://elisa.fluendo.com (main developer)</span></span></p> </li><li> <p><span><span>http://www.andalucia.org (developer of some parts, mainly the shop and the registration system)</span></span></p> </li><li> <p><span><span>http://www.muchomasqueunregalo.com (developer of the Django part of the web site, including the shopping system and product detail pages).</span></span></p> </li><li> <p><span><span>http://www.accopensys.com (only developer)</span></span></p> </li><li> <p><span><span>http://www.showroom.es (only developer)</span></span></p> </li><li> <p><span><span>http://www.tierratenis.com (only developer)</span></span></p> </li><li> <p><span><span>http://www.latelierdelraval.com (only developer)</span></span></p> </li><li> <p><span><span>http://www.restaurantalpunt.com (only developer)</span></span></p> </li></ul> <p><span><span>I'm also one of the two official translators of Django to Castilian Spanish and Catalan. In addition, I was interviewed about localization on Django on <em>This Week in Django</em> 20 (on 2008-04-27). I maintain a blog with many Django related posts at http://vaig.be.</span></span></p>Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com12tag:blogger.com,1999:blog-4345066302652424538.post-58016274436685176562009-03-25T18:50:00.002+01:002009-03-25T19:22:40.734+01:00django-cart released!Until now, if you had to develop an online store in django, you had two options, use <a href="http://www.satchmoproject.com/">satchmo</a>, or write your own code. Satchmo is a huge application that tries to provide everything for all the cases, so for a simple shop you've to deal with hundreds of features that you're not going to use, and in some case it won't be enough flexible.<br /><br />So what I've not any complain for satchmo, the fact is that is not the ideal solution for some cases as some small shops with few options.<br /><br />With that said, this post is to announce the release of a new project that could help some people to do simple web shops in a very simple way. This project is <a href="http://code.google.com/p/django-cart/">django-cart</a>.<br /><br />While django-cart already existed, it was an unfinished (and unmaintained) project by Eric Woundenberg, to whom I'm very thankful for letting reuse it's project, and avoid confusion.<br /><br />So, what's django-cart. Django Cart is basically a django application that provides a Cart class, with add/remove/update and get methods to be used for storing products. The products model isn't included in the application, so you can define your products with the fields you need. Then you just need something like...<br /><pre><br />product_to_add = MyProductModel.objects.get(id=whatever)<br />cart = Cart(request)<br />cart.add(product_to_add, product_to_add.price)<br /></pre><br />and your product will be saved on the database, on a session based cart. Getting the content of the cart is as easy as itering the cart instance.<br /><br />And basically that's it. More information is available on the project page. Just note that the current version of the application is unstable, and hasn't been tested enough, so feel free to use it, but consider that you'll have to test it by yourself and report/fix some bugs.<br /><br />I hope all you like it!Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com12tag:blogger.com,1999:blog-4345066302652424538.post-46050560159991892782009-03-11T13:30:00.003+01:002009-03-11T16:11:40.076+01:00Getting client OS in DjangoSome times it can be useful to serve our site content with little differences depending on the visitor operating system. I really think it's a bad idea changing the content or doing some big changes, depending on it, but this post can be useful for it as well.<br /><br />So, while most time just some Javascript is used to customize user experience based on its operating system, few times it'll also be useful to do it in the server side.<br /><br />For those cases, here you've a simple context processor that will make available a template variable named "platform" which content can be "Linux", "Mac" or "Windows".<br /><br /><pre><br />import re<br /><br />def user_agent(request):<br /> ''' <br /> Context processor for Django that provides operating system<br /> information base on HTTP user agent.<br /> A user agent looks like (line break added):<br /> "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.6) \<br /> Gecko/2009020409 Iceweasel/3.0.6 (Debian-3.0.6-1)"<br /> '''<br /> print 'user_agent'<br /> # Mozilla/5.0<br /> regex = '(?P<application_name>\w+)/(?P<application_version>[\d\.]+)'<br /> regex += ' \('<br /> # X11<br /> regex += '(?P<compatibility_flag>\w+)'<br /> regex += '; '<br /> # U <br /> regex += '(?P<version_token>[\w .]+)'<br /> regex += '; '<br /> # Linux i686<br /> regex += '(?P<platform_token>[\w .]+)'<br /> # anything else<br /> regex += '; .*'<br /><br /> user_agent = request.META['HTTP_USER_AGENT']<br /> result = re.match(regex, user_agent)<br /> if result:<br /> result_dict = result.groupdict()<br /> full_platform = result_dict['platform_token']<br /> platform_values = full_platform.split(' ')<br /> if platform_values[0] in ('Windows', 'Linux', 'Mac'):<br /> platform = platform_values[0]<br /> elif platform_values[1] in ('Mac',):<br /> # Mac is given as "PPC Mac" or "Intel Mac"<br /> platform = platform_values[1]<br /> else:<br /> platform = None<br /> else:<br /> full_platform = None<br /> platform = None<br /><br /> return {<br /> 'user-agent': user_agent,<br /> 'full_platform': full_platform,<br /> 'platform': platform,<br /> } <br /></pre><br /><br />To make it work just copy the code in a file<br /><br /><pre>myproject/myapp/context_processors.py</pre><br />add it to context processors in settings<br /><br /><pre>TEMPLATE_CONTEXT_PROCESSORS = ('myproject.myapp.context_processors.user_agent', [...])</pre><br />and don't forget to add the RequestContext parameter if you are processing your template with render_to_response and want the variable available <br /><br /><pre>from django.template import RequestContext<br />[...]<br />render_to_response('mytemplate.html', mycontext, <span style="font-weight:bold;">RequestContext(request)</span>)</pre><br /><br />Then you'll be able to do something like that in your templates:<br /><pre><br /> <p>You are a {{ platform }} user.</p></pre>Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com4tag:blogger.com,1999:blog-4345066302652424538.post-42953677584958900822009-03-10T00:00:00.003+01:002009-03-10T00:32:50.733+01:00django-multilingual syntax pollThose days there is some activity in the django model translation area, specially for the two new projects that joined <a href="http://code.google.com/p/django-multilingual/">django-multilingual</a> and <a href="http://code.google.com/p/transdb/">transdb</a> to achieve this: <a href="http://code.google.com/p/django-transmeta">django-transmeta</a> and <a href="http://code.google.com/p/django-modeltranslation/">django-modeltranslation</a>.<br /><br />While there are some intentional differences among some projects (for example django-modeltranslation is the only one that can translate models without editing them), it would be great to merge all (or most) existing projects, and join the efforts to get our best application (and hopefully it'll worth to be included in Django itself).<br /><br />So, with the merge of those applications in mind, we're <a href="http://groups.google.com/group/django-multilingual/browse_thread/thread/2fab1d1674090079">planning</a> to create a branch on django-multilingual that will have the very best of each existing application, and any other cool idea.<br /><br />So if you have good Python/Django skills, and want to add some open source work in your CV... ;) join us now!<br /><br />Or if you are a potential user of this application, or you just think that your opinion is worth to be shared, please fill the <a href="http://doodle.com/aicvayf8ss2mxm2h"><span style="font-weight:bold;">MODEL SYNTAX POLL</span></a>, or mail <a href="http://groups.google.com/group/django-multilingual/browse_thread/thread/2fab1d1674090079">us</a> with your ideas.<br /><br />Here there are simple sample for each option on the poll:<br /><br />class Translation<br /><pre><br />class MyModel(model.Model):<br /> my_field = CharField()<br /><br /> class Translation(multilingual.Translation):<br /> my_i18n_field = CharField()<br /></pre><br /><br />custom fields<br /><pre><br />class MyModel(model.Model):<br /> my_field = CharField()<br /> my_i18n_field = TransCharField()<br /></pre><br /><br />separate model<br /><pre><br />class MyModel(model.Model):<br /> my_field = CharField()<br /> my_i18n_field = CharField()<br /><br />Class MyModelTranslation(TranslationOptions):<br /> fields = ('my_i18n_field',)<br /></pre><br /><br />translate attrs in Meta<br /><pre><br />class MyModel(model.Model):<br /> my_field = CharField()<br /> my_i18n_field = CharField()<br /><br /> class Meta:<br /> translate = ('my_i18n_field',)<br /></pre><br /><br />translate=True in field options<br /><pre><br />class MyModel(model.Model):<br /> my_field = CharField()<br /> my_i18n_field = CharField(translate=True)<br /><br /></pre><br /><br />Do you have a better idea?<br />Just leave a comment here,<br />or write a mail on this <a href="http://groups.google.com/group/django-multilingual/browse_thread/thread/2fab1d1674090079">thread</a>Marchttp://www.blogger.com/profile/01286849404527531329noreply@blogger.com5