In the past couple of days, I tried to run some map-reduce jobs on EMR through python streaming. The API I used is
boto. It's really basic and not very documented. I was only able to find one
example. One thing I learned the hard way is about the data coming out of hive. Surprising, no matter what input format (in terms of separators), the data out of hive is always 'ctrl-A' separated. Check
this.
No comments:
Post a Comment