Of course, brute force plotting of every point won't work. The points will step on top of each other. The density of data points will get lost in the sea of points. And it takes forever to run the command. The better solution would be to use hexbin plots, which can handle million+ data points. The data points are first assigned to hexagons that covers the plotting area. Then head counts were done for each cell. At the end, the hexagons got plotted on a color ramp. R has a hexbin package to draw hexbin plots and a few more interesting functions. R ggolot2 package also has a stat_binhex function.
hexbinning that 13 million data points |
- Enrico Bertini has a post regarding things that could be done to deal with visualizing a lot data.
- Zachary Forest Johnson has a post devoted to hexbins exclusively, which is very helpful.
- Last but not least, the hexbin package documentation talks about why hexagons not squares, and the algorithms to generate those hexagons.