Sarah M Brown

brownsarahm

smariebrown

brownsarahm

My first Software Carpentry Workshop

page view

4/10/18

Last month, I spent a weekend across the bay at UCSF teaching introductory python with the SWC Python Gapminder in a workshop hosted by the UCSF Library. My first experience teaching with the carpentries went well, I think. It was fun and exhausting. After the second day, I turned down meeting friends in SF doing tourist things and instead went home and sat. In the dark. In silence. Despite the fact that I find teaching fun, I also identify strongly as an introvert and find interacting with people for long periods of time exhausting. I recharge by being alone.

Now that I’ve had some time to recover, and to recover from the things I had booked too many back to back, I’ve sorted through my thoughts and want to share about my first teaching experience.

Preparing to teach

I had plenty of time to prepare to teach for this, I had even set aside some time well in advance to review the specific curriculum. Then I didn’t, until the day before. There were two main things I felt like I needed to prepare to be able to teach. First I wanted to be familiar with the specific dataset we were going to work with and the order of the content in the lesson and second, I didn’t know logistically how to handle exercises. I was confident that I was ok with the material, but knowing the order makes answering questions and providing extra context easier. I also didn’t want to be reading too much from the notes. How to handle the exercises was important to me because I wanted to have an idea of how things would work.

I ended up only going through the material in depth the afternoon before. I spent about 4 hours reading through it all and preparing. For teaching 6 hours, that’s a reasonable amount of preparation, if I can walk myself through in much less time, then there’s a chance of reaching it with novices. More time would have only left me making up more content that we would never reach and being over prepared.

I prepared by forking the lesson material and making notes in my own copy about what additional things I wanted to do or any changes that I wanted to make. I then pushed my changes to my fork and served them. Inspired by on a tip I got during my instructor training, during the workshop I pointed my tablet to my version of the lesson material. That worked well for me. I had notes visible to me and could still leave my whole screen for the students.

Preparing Exercises

For the exercises, I was preparing to teach this workshop while also preparing to teach a mini version of the Ecology lesson at the NSBE Convention. For the 90 minute time constraint, I had found out about the %load magic of jupyter notebooks- which allows reading in a file(or excerpt of one) into a notebook cell for editing. I decided to use this for the exercises of this workshop too. As i went through the material, I also copied the exercises into separate .py files for this purpose. I set them all up in a separate github repository that also had a data folder.

Since github also allows people to download a .zip of the respository, using git to host it meant that I could keep my workflow what I’m comfortable with, including last minute updates and still allow the learners to download only a file that they were already familiar with. I shortened the url to .zip file for the repository on a branch that I made to be a snapshot of the content prior to the start of the workshop. At the workshop, I shared the shortened url that allowed learners to download the file, had them unzip and move it to a working location and then set that to their working directory to start the lesson. After the first day, I made a new branch added the materials that I generated day 1, fixed a few exercises for day 2 and then .gitignored the files that were not changed on the new branch, so the learners could download just the new material. When the workshop was over, they had learned git so I directed them to the actual repository page of a postworkshop branch to download my final materials. After the workshop, I made a jupyter notebook outlining this whole process so that I can reuse it.

Actually teaching

I started a little fast, but the helpers, who had more Carpentries experience than me and the stickies-system helped me slow down as we went. In the Carpentries, we provide all learners with sticky notes in two colors at the beginning of the workshop. We then instruct learners to put a red sticky on their laptop, like a flag visible from both in front and behind them, when they have trouble. When we break for exercises, we can also instruct learners to put the blue (or green, ideally) sticky up when they’re ready to move on.

Using load magic to the exercises worked pretty well. Getting used to some features of the notebooks was a little hard. Some of the activities required and others were just easier, if the learners split the imported .py file into multiple notebook cells, that was a bit hard. A few activities were to enforce concepts more than actually writing code, so those were best formatted as markdown cells instead of python cells, switching types also caused some friction at the beginning. By the second day though, most seemed comfortable with these activities and I hope that that push to learn a little bit more about manipulating Jupyter notebooks makes using them in their own work that much more accessible.

We were running behind, the day two getting everyone started in the same place again took more time than I expected as did many of the exercises. The team at UCSF runs a lot of workshops and has found that the advanced part of the git lesson tends to frustrate and demotivate learners at the end of the day. Instead, they offered me extra time to continue python. We still didn’t finish, but made it somewhat farther with the extra hour.

Overall it was a great first experience and I’m excited to be a member of the carpentries community.

Race & AI Panel

ML at Berkeley

4/5/18

I gave at talk to Berkeley undergraduates at the Machine Learning at Berkeley (ML@B) general meeting. I talked about work in progress and we had a great discussion interpreting the results. They asked great questions about both technical details of the work and the broader implications and positioning of the work.

The abstract I provided for the talk is here: In the age of Big Data, we now have data for an abundance of new concepts that have been historically studied only qualitatively. Data science tools make working with data accessible to those even without a background in the underlying statistics. Together, these facts mean that the way that machine learning algorithms are being used is often quite different from how the use cases imagined when they were designed. My work aims to answer the question, how to we need to adapt or augment machine learning algorithms to facilitate data driven discovery in these domains? In this talk, I’ll frame some of the common technical challenges in my work and show preliminary results on tools I’m building to augment ML algorithms.

Data Carpentry at NSBE

CMU ML Lunch

Data Carpentry Certified Instructor

11/1/17

Now that I’ve received my certificate, it’s official, I’m a Data Carpentry instructor. To do this, I completed a two day training, submitted a pull request to a Data Carpentry lesson, and passed a teaching demonstration.

Now I’m ready to teach and working on planning one for #NSBE44

UC Berkeley Chancellor's Postdoctoral Fellow

Getting Unstuck in Writing for Research

page view

5/15/16

I was recently contacted by another graduate student for advice on how to deal with feeling bogged down by theoretical and mathematical detail while working on a journal paper.  This is a problem that I have a lot actually, I don’t think I’ve gotten it all solved, but I have developed a number of strategies for getting through it.

Learn Strategies

The most general one is somewhat circular, but more proactive.  I’ve put significant effort toward learning to be a better researcher, learner, writer, and generally productive person. I’ve read some books and countless articles on these types of matters. This of course isn’t the best in the moment before a deadline solution, to try to consume all of these materials at once and then magically be able to get your work done, but slowly working through these materials over the course of time has made me lose less time and get less worried when I face these struggles. I maybe face them less often or maybe with about the same frequency, but lose less time with each occurrence and I do more complex and more theoretical work than I did at the beginning of graduate school. This strategy can help a little immediately as well. When I get really stuck I pause and spend 20-30 minutes reading whatever’s next on my ‘get better at x’ list or the book I’m currently working through. After a few minutes of a productive feeling distraction, I often have a better idea of how to proceed.   This used to be my first strategy, I’d spend a few minutes reading and learning about things I could try until I found one that sounded like a good strategy to try.  Recently this has fallen lower on  my list, because the ones below get me back on track.  I think this is the most important strategy and that it should the first one I mention here, because even though the strategies below help me they may not help you so learning about as many strategies and trying them out until you settle into your own toolbox of strategies is the most important.

Start Typing. Don’t Stop.

I open a separate space (for me, 750words.com which I’ve mentioned previously or writebox is the new one I’ve just started occasionally) somewhere where the project isn’t there, I have no context and zero pressure for formatting or even correct grammar or spelling. I start typing and don’t stop until I reach a predetermined goal.  I most often require myself to push through this exercise until I reach 750 words, sometimes though I go much longer, other times to get there I write some pretty redundant things (not literally repetitive though, that’s cheating), but having a minimum I must reach require me to reach some level of breadth or depth.  This strategy has a few different sub-components that I’ve pulled together from other places, but in general, getting whatever ideas, or stumbling blocks, that are on my mind out and in writing helps me move on. The separate space is important for me because there’s no pressure for what I write to fit into a project or distraction of the existing text.  I have no problem writing non stop with varying tone, audience or topic, when it’s not in a project, just what I need out of my way: a note on what I’m stuck on to my adviser, a new draft of how it could go, another version of the same paragraph from a different perspective, etc. Then when I’m done, I copy and paste anything useful into the project.

Most often, I start writing about what I’m having trouble writing, coding or figuring out and why, maybe with a friend or my adviser in mind as the audience. Sometimes that takes a while, but at the end I at least feel better and usually have an idea. Usually, after about a paragraph, I have an idea for how to fix the thing or explain whatever it is I’m having trouble with.  I start to then draft the writing I need, maybe halfway through a paragraph I have a better way to say something, so I just write the second option next. Later, I can pick and choose the best aspects of each way, or maybe after I can try to write out a justification for each and use those to decide.  This is nonstop writing, no revising; clean up and sorting is for after.

Other times I try to write out instructions to myself for what to do or the new questions I have to research in order to progress. When stuck with writing specifically, I’ve found that writing out the objectives for the section I’m having trouble with, “by the end of this section the reader should …” helps a lot.  One of my favorite get unstuck freewriting exercises it to write out the material as a script of a talk I might give if I were talking to children or other lay people.  This text of course, won’t be useful to directly copy and paste into a manuscript, but often helps me figure out how to write for the manuscript.

Structured Struggling

Sometimes, struggling through work just has to happen.  Staring at the problem, thinking about it every imaginable way, reading and rereading, writing and rewriting.  Doing this endlessly, however wouldn’t move anything progress, so I do a lot of my work using the pomodoro technique, I’ve mentioned it previously [](http://www.sarahmbrown.org/5-academic-tools/”>here. I set a timer for 25 minutes and have to stay on task for those 25 minutes, no e-mail, texts, calls, social media, anything. I also pick a single specific task on a specific project (ex: write section 1.4, reorganize chapter 2, etc) that I have to work on the whole time. If I truly finish in less time, I can move onto something that’s a natural succession, but if I’m stuck I have to stay and keep trying until that 25 minutes is up. Then 5 minute break and repeat. After 4 take a longer break. The 25 minutes is often long enough I get myself unstuck, but short enough I don’t feel like I’m wasting time and I also stop and move on before I actually mull too long.

Visualize: Digitally, Not Mentally

I’m a very visual learner, so writing is hard for me. Explaining and learning things with diagrams and tables is easier for me; when I’m troubleshooting code I generate figures at intermediate steps to see how the data changes to check that it’s working.  When I think through how ideas relate, I think of them that way.  Sometimes, I draft slides or paper figures for a section when I’m having trouble with or the whole thing if I’m having trouble with organization.  For equations, while a specific equation might be hard to understand and explain, an annotated plot of it might easier. Thinking about how to visualize content for quick understanding that’s necessary for good slides, helps me figure out how to explain it in text. Also, then I feel like I’m making progress, I’ve at least got the figures for the paper or the slides for later ready in advance.  The slides, I can zoom out on and see a story board of the whole project which helps provide perspective that is useful for organization.

Analogy

For really abstract theory, try to think of an analogy that could work to explain the concept to a child. That process helped a lot with a large collaborative project. We needed to explain the mathematical (systems theory) definition of state for an audience of psychologists. We tried a few different analogies until we got one that was simple enough to cover what we needed. That is included in the paper as a box/sidebar, but also it helped us really refine the right words to use to provide the base definition in the main text of the paper.  We have a long list of failed candidate analogies we brainstormed in meetings, but each one got us closer to one that actually worked.  This can be worked into a free-writing exercise but I more often do it with a table.  I make a row for each aspect and a column for the analogy and then fill in the boxes with what about each analogy relates to that aspect of the concept.  I often add rows as I go and sometimes it takes some other figure-like form, but the idea is to relate the work to something that’s more broadly understandable.  Giving yourself new ways to think about it, will help you not only write it down for others to understand, but can even give you new ideas about how to approach the problem or extend your result.

Google Forms for Better Live Discussion

page view

11/3/15

During a workshop I hosted Friday, I was asked how I designed the activity we did. Here’s a quick writeup on how that worked. First, a little context. I presented an 80 minute workshop at the Region 1 FRC. I’ve attended NSBE conferences enough times to know that, no matter how interested I was in a workshop, lack of sleep influences my ability to focus, so I wanted to ensure the workshop was engaging and active. The conference theme for this year is engineering a cultural change; my take on this as a machine learning researcher is big data for social change. My objective for the workshop was that the attendees both learn about the core ideas of machine learning and big data to understand context if following up further and realize how it’s an exciting field with lots of room for exploration and discussion.  The workshop was formatted with the information loaded more at the front, but that we quickly worked into shaping the conversation around the attendees’ interests. I wanted to make sure that the activities were challenging and prompted discussion, but that they were also accessible, so I made it group activities.

However, in my own experience, too many groups reporting out and sharing their responses to the same questions, can get repetitive and boring.  To be able to let all groups share and give myself the ability to select the best groups to share for each different portion of the activity, I used Google forms and had the groups submit their answers to each step. Even without wifi, having participants complete the activity by submitting responses on their smart phones it worked great. I wanted the activity to be completed in stages: after some introduction from me, we’d break out, report back, discuss add new material, and repeat a few times. I also didn’t want the groups to have to type any information repeatedly while I could still match responses from one breakout part to the next. To achieve this, I set it up for them to “edit their response” and for separate pages of questions for each stage of the activity.

In Google forms, here’s how to set up a form for use in an activity like what I ran:

  • Set  the first question as a multiple choice question, “Breakout part” and set “after page 1” to “go to next page”.
  • For each sub-activity, add page breaks and name the pages.
  • Set the various choices to the “Breakout part” question to jump to the respective pages, by checking the box, “Go to page based on answer” and then set the “Go to page” field on the breakout choice question.
  • On the separate pages, add the questions necessary for each part of the activity and for each of those pages, set “after page x” to “Submit form”.
  • At the bottom, below the confirmation text, check the box for “Allow responders to edit responses after submitting.”

In testing, I added some dummy responses, to each phase. Then I created figures that displayed the results of each step in the different ways I wanted to have available for discussion. With the figures made, I published them and then made short urls for each published figure that I could put in my slides for use during the workshop. Before the workshop, I cleared my dummy data out of the spreadsheet.  In the first activity, I had the groups name their project with an identifier that could be used as a title in subsequent steps.  When I gave the instructions for the activity, I reminded them to save the “Edit my response” link from the confirmation page. Having the breakout questions on a form they could open on their own, also allowed me to go back to reference slides while they discussed and meant I didn’t need to print out any handouts.

During the workshop, as the attendees completed each activity, I clicked the link on the next slide and we could then use the plot of their results. This made it easy to compare the prevalence of various results and discuss trends.The first activity, the groups defined a problem they wanted to apply machine learning to, for this I had all of the groups share their idea. In the second, they rated how hard various steps of the design process would be for their problem. Since I had them submit the results in the form, I was able to call on groups based on being atypical or extreme to justify their choices instead of going around and having all of the groups explain their decision making since much of it would be similar. The third idea was a series of a or b choices about what types of machine learning they might want for their problem. Again, being able to discuss the trends and commonalities immediately after the participants finished their small group discussions, made for a better use of time and I was able to have groups who chose differently than the others explain their decision making.

In the end of the workshop, the attendees said they enjoyed the workshop format, had learned something new and that entering the responses via the form wasn’t too complicated.

I’ve published as a template a simplified version of the google form used for the activity.

A Gentle Technical Reading List for Big Data for Social Good

page view

10/30/15

As a machine learning researcher, Big Data for Social Good is my take on this year’s NSBE conference theme of Engineering a Cultural Change.  Today, I’m presenting a workshop at the NSBE Region 1 Fall Regional Conference on that topic, but there’s so much to share, this is mostly intended as additional information for the attendees, but I think this could be useful more broadly.  My research, isn’t exactly on Big Data for Social Good, but I do applied machine learning research and I think there’s some important commonalities.  I begin from a real problem and design smart algorithms to help domain experts make sense of their data- this is exactly what a Data Scientist working at or with a nonprofit would do.  In my graduate work my collaborators have been psychologists who want to ask categorically different questions- questions so novel that traditional experimental designs and analysis techniques don’t get the job done. Since I’ve spent so much of my time outside of the classroom and lab dedicated to social impact through NSBE so Big Data for Social Good is a personal interest and possible future direction for me.

Machine learning and big data appear all over lately but there are a number of key resources that I think anyone interested in data driven methods for decision making, even if outside of the technical realm should consider and support making sense of. These are, however, challenging problems . There is additional research that must be done toward this end. Here I provide a list of some of my favorite (mostly) accessible machine learning papers that I think are good reading material for someone broadly interested in machine learning for social good but is not an expert in machine learning yet.  These will help you begin to get some perspective on the relevant technical matters and research questions without being bogged down by details.

Model Based Machine Learning

In this paper he develops the storyline of what model based- as opposed to feature engineering based- machine learning looks like. Along the way, he describes the basics of probabilistic graphical models- a language for expressing statistical models - and concludes with a new type of programming language designed specifically for machine learning. This paper is helpful not only because that easy to read tutorial provides knowledge that will make reading many other papers easier, but because models are a widely understood idea- many other researchers use models of some form. For me, that makes model based machine learning especially important when working with data from any application domain, especially one where decision makers may not be as strong in math and computer science. There’s also a forthcoming book on the topic.

Big Data, Machine Learning, and the Social Sciences: Fairness, Accountability, and Transparency

There have been a series of workshops lately on fairness, accountability, transparency in ML. This medium post by Hanna Wallach -she’s also a founder of WiML and pretty awesome in general- summarizes a talk she gave at the first FATML workshop.  She focuses on data, questions, models, and findings to highlight the state of the field and some of the key challenges in computational social science with respect to fairness accountability and transparency.  She begins with a few different definitions of big data- what makes the current trend in big data different from large and a pretty clear overview of some of the challenges facing machine learning if applied to social science problems.

Machine learning: Trends, perspectives, and prospects

This clearly written survey name-drops just about every sub area in machine learning and paints a pretty clear picture about how the areas relate or compare. This article will serve as a great launching point if you think you might be interested in machine learning but want to This concludes by highlighting some of the core challenges facing machine learning going forward.

Machine Learning that Matters

This paper reads more as a position paper, it’s again, not technical but this essentially argues that too many machine learning papers and journals focus on the incremental discovery of machine learning techniques without actually attending to the data on which the methods are applied. It highlights a systematic problem in machine learning: reusing the same data sets repeatedly for tasks that may or may not have actually been of interest to those who generated the dataset. Arbitrary measures of performance declare the author’s proposed new method better than another, but don’t declare clearly why.  There was also a follow up special issue of Machine Learning Journal on Machine Learning for Science and Society.