Broadening Participation in Data Mining#
I just attended the Broadening Participation in Data Mining workshop co-located with Knowledge Discover in Databases (KDD). The workshop was designed to provide exposure and professional development workshops to groups underrepresented in datamining and provide the experience of attending one of the top conferences in the field. The program included two keynote talks, mentoring sessions, in the lab sessions and panels on publishing and career choices. One keynote was by Natasha Balac from the UCSD Super Computer Center. She spoke on what data science is and sorting out what it means. She described the four V’s used to define “big data”: volume, velocity, variety, and veracity. Big data is a lot of data, in mixed forms, that is generated rapidly and has some uncertainty to it. I found a good infographic defining it in more detail. She also described what the supercomputer center does and how she used her PhD in machine learning and artificial intelligence working there and then became the director of the Predictive Analytics Center of Excellence.
The most helpful sessions were the mentoring sessions. The attendees were PhD students and post docs and the mentors were PhDs working in both industry and academia. They gave us tips about thesis development, career path, and personal branding. My favorite piece of advice was from one of the organizers, professor [](http://web.ics.purdue.edu/~brandeis/index.html” target=“_blank”>Brandies Hill Marshall, that as a prospective faculty member you’re, “COO, company of one.” They also gave tips for forming a thesis committee, what to include in a job talk and some of the variety of the scope of job talks they were asked to give for various jobs.
Additionally, it expanded my network within the field.