October 15, 2009

Monthly Bay Area Hadoop user group

What: Bay Area Hadoop user group
Where: Yahoo! Sunnyvale Campus
When: Wednesday (Oct 21st), 6PM

Registration and Agenda are available here http://www.meetup.com/hadoop/calendar/11532125/


6-6:15 – Socializing and Beers

6:15-7:00 – Mumak – Using Simulation for Large-scale Distributed System Verification and Debugging Hong Tang – Yahoo!

7:00-7:30 – Cloudera Desktop in Details Philip Zeyliger, Cloudera

7:30-8:00 QnA and Open Discussion

October 9, 2009

4th Hadoop User Group DC

When: Friday, October 16, 2009 at 6:30 PM.

Where: Sheraton Reston 11810 Sunrise Valley Drive, Reston, VA 20191

Registration: http://www.meetup.com/Hadoop-DC/calendar/11303458/
Webpage: http://www.meetup.com/Hadoop-DC/


Follow Lalit Kapoor (http://twitter.com/idefine) on Twitter and the hadoop mailinglist for updates.

October 7, 2009

Boston Hadoop Meetup

The first meeting of the Boston Hadoop Meetup is scheduled to take place on Wednesday, October 28th, 7 pm, at the HubSpot offices:


(HubSpot is at 1 Broadway, Cambridge on the fifth floor. There Will
Be Food.)

There will be 20 minute presentations, time for questions as well as 5-minute lightning

Concatct Dan Milstein (dmilstein@hubspot.com) for more information.

September 29, 2009

NSF, Google, IBM CLuE PI Meeting: October 5, 2009

Information taken from the announcement on the Hadoop Common user list by Jimmy Lin.

==CLuE PI Meeting 2009==

Monday, October 5, 2009
Computer History Museum
Mountain View, California

Sponsored by the National Science Foundation, Google, IBM
Organized by the University of Maryland Cloud Computing Center

Website: https://wiki.umiacs.umd.edu/ccc/index.php/CLuE_PI_Meeting_2009
Registration: http://clue2009.eventbrite.com/

= What’s this event about?

In October 2007, Google and IBM announced the first pilot phase of the
Academic Cloud Computing Initiative (ACCI), which granted several
prominent U.S. universities access to a large computer cluster running
Hadoop, an open source distributed computing platform inspired by
Google’s file system and MapReduce programming model. In February 2008,
the ACCI partnered with the National Science Foundation to provide grant
funding to academic researchers interested in exploring large-data
applications that could take advantage of this infrastructure. This
resulted in the creation of the Cluster Exploratory (CLuE) program led
by Dr. Jim French, which currently funds 14 projects. See this NSF Press
Release for a short description of all the projects funded under the
CLuE program.

Nearing the two year anniversary of this collaboration, the National
Science Foundation, Google, and IBM will be jointly sponsoring a meeting
for the CLuE project principal investigators (PIs). This will event will
be open to the public—in fact, the explicit goal of this event is to
showcase the exciting research currently underway in academia and
promote closer ties with the broader “cloud computing” community in the
bay area.

= Schedule at a Glance

See below for an overview of talks scheduled for the day. We are pleased
to welcome two keynotes, by Hamid Pirahesh from IBM (in the morning) and
Luiz Barroso from Google (in the afternoon). The meeting will be capped
off with a poster reception in the early evening, where representatives
of all CLuE projects will present their work in a more informal setting.

Morning Session

(07:30 – 08:00) Registration and breakfast

(08:00 – 08:30) Introductions

(08:30 – 09:15) IBM keynote: Impact of Cloud Computing on Research in
Extreme Scale Analytics. Hamid Pirahesh

(09:15 – 09:40) Topic-Partitioned Search Engine Indexes. Jamie Callan,
Jaime Arguello, Anagha Kulkarni (CMU)

(09:40 – 10:05) Indexing Geospatial Data with MapReduce. Naphtali Rishe,
Vagelis Hristidis, Raju Rangaswami, Ouri Wolfson, Howard Ho, Ariel Cary,
Zhengguo Sun, Lester Melendes (Florida International University)

(10:05 – 10:30) Morning coffee break

(10:30 – 10:55) Scalable Graph Processing in Data Center Environments.
Ben Zhao, Xifeng Yan, Divyakant Agrawal, Amr El Abbadi (University of
California, Santa Barbara)

(10:55 – 11:20) Large-Scale Data Cleaning Using Hadoop. Chen Li, Michael
Carey, Alexander Behm, Shengyue Ji, Rares Vernica (University of
California, Irvine)

(11:20 – 11:45) Cluster Computing for Statistical Machine Translation.
Stephan Vogel, Qin Gao, Noah Smith, Kevin Gimpel, Alok Parlikar, Andreas
Zollmann (CMU)

(11:45 – 12:10) Research and Education with MapReduce/Hadoop:
Data-Intensive Text Processing and Beyond. Jimmy Lin, Tamer Elsayed,
Chris Dyer, Philip Resnik, Doug Oard (University of Maryland)

Afternoon Session

(1:00 – 1:45) Google keynote: Datacenter-Scale Computing. Luiz André Barroso

(1:45 – 2:10) A Performance and Usability Comparison of Hadoop and
Relational Database Systems. Sam Madden, Andrew Pavlo, Erik Paulson,
Alexander Rasin, Daniel Abadi, David DeWitt, Michael Stonebraker (MIT,
Brown, University of Wisconsin, Microsoft, Yale)

(2:10 – 2:35) HadoopDB An Architectural Hybrid of MapReduce and DBMS
Technologies for Analytical Workloads. Daniel Abadi, Azza Abouzeid,
Kamil Bajda-Pawlikowski (Yale University)

(2:35 – 3:00) Towards Interactive Visualization in the Cloud. Bill Howe,
Huy Vo, Claudio Silva, Juliana Friere, YingYi Bu (University of
Washington, University of Utah)

(3:00 – 3:30) Afternoon coffee break

(3:30 – 3:55) Scaling the Sky with MapReduce/Hadoop. Andrew Connolly,
Jeff Gardner, Simon Krughoff (University of Washington)

(3:55 – 4:20) Commodity Computing in Genomics Research. Mihai Pop, Mike
Schatz (University of Maryland)

(4:20 – 4:45) Relaxed Synchronization and Eager Scheduling in MapReduce.
Ananth Grama, Suresh Jagannathan (Purdue University)

(4:45 – 5:10) Dynamic Provisioning of Data Intensive Applications.
Chaitanya Baru, Sriram Krishnan (San Diego Supercomputer
Center/University of California, San Diego

katta meetup in Palo Alto

The katta developers meet today (September 29th) at 7pm local time at http://www.roseandcrownpa.com/ for some drinks food an interesting discussions.

Hadoop/ Lucene/ Aapche”Cloud” Stack Meetup

When: Wednesday, the 30th, at 6:45 pm.

Where: University of Washington, Allen Computer Science Center (not Computer Engineering)
Map: http://www.washington.edu/home/maps/?CSE
Room: 303 -or- the Entry level.

More Info: The meetup is about 2 hours (and there’s usually food): we’ll have two in-depth talks of 15-20
minutes each, and then several “lightning talks” of 5 minutes.

Contact: Bradford Stephens

Cascading Meetup – September 24th

RapLeaf was hosting a Cascading meetup last Thursday.

Topics featured at the meetup:

  • Bradford Cross from FlightCaster will discuss his work with Clojure + Cascading.
  • Nathan Marz from Rapleaf will cover how Rapleaf uses Cascading to do large scale batch querying.
  • Chris Wensel from Concurrent, Inc. will cover Cascading 1.1, 1.5, and roadmap of changes.

September 9, 2009

First NoSQL Meetup Germany

The first NoSQL Meetup Germany is to take place on October 22nd in Berlin.

Submit your talks until September 22nd, the schedule will be published soon after – so plenty of time to even book your flights to Berlin.

September 3, 2009

Third Hadoop in China event

Sponsored by Yahoo! and Cloudera the third Hadoop in China is scheduled to take place in November 2009. CfP and registration page are currently open.

  • What: Hadoop in China event.
  • Where: Beijing
  • When: November 15th, 2009.

See http://www.hadooper.cn/hadoop/cgi-bin/moin.cgi/thirdcfp for more information.

August 28, 2009

Apache Lucene/ Solr Meetup – SFBay – September 3rd

Erik Hatcher is organizing another Lucene/ Solr Meetup in the SFBay area next month. Further details:

What: SFBay Apache Lucene/Solr June Meetup
When: September 3, 2009 6:30 PM
Where: Computer History Museum, 1401 N Shoreline Blvd, Mountain
View, CA 94043

Presentations and discussions on Lucene/Solr, the Apache Open Source Search Engine/Platform — featuring:
• “Lucene Search Performance Analysis”: Andrzej Bialecki, Nutch Committer / Luke author
• “Can you find what they found? Solr @ Digg.com”: Sammy Yu, Digg.com Search Development
• “Looking at Solr Relevancy”: Mark Bennett, New Idea Engineering
• “Search at Netflix and beyond”: Walter Underwood, Search Veteran–Infoseek, Verity, Netflix
• “Innovations in search and social media”: Brian Pinkerton, Chief Architect, Lucid Imagination

More talks posted shortly! Presentations followed by Lightning Talks from community members. Lightning talks open for registration soon.

