Sunday, August 30, 2015

Art and Science - Symmetry Poems

The idea behind symmetry is a simple one: what operations can you perform on an object while preserving its appearance or structure? From this simple question, an infinite variety of possible patterns emerge--as the abundant symmetries found in mathematics, art, and nature can attest to. In order to explore the possibilities of symmetry in poetry in a more manageable way, we need to restrict our focus to a certain class of symmetries known as frieze patterns.

Frieze patterns are symmetrical patterns that extend infinitely along one direction, like a number line. Every frieze pattern contains at least one symmetry: translational symmetry, which guarantees that the pattern repeats in space after a finite distance. Shifting the entire pattern by integer multiples of this distance preserves the structure. In addition to translational symmetry, frieze patterns can also contain reflections about horizontal and vertical lines, as well as 180º rotations and glide reflections about horizontal lines (glide reflections are simply reflections combined with translations). An alternative way to visualize frieze patterns is to imagine building them. Start with a simple, asymmetrical shape to use as a seed, and then transform it using rotations, reflections, translations, and glide reflections as needed. Amazingly, with these four types of transformations, it's only possible to build seven distinct symmetric structures out of the same seed.1 These seven patterns are referred to as symmetry groups, and frieze patterns represent a specific type of structure called a two dimensional line group.2

In an earlier post, I created a graphic representation of the pattern found in a poetic form called the sestina. A few weeks later, I was experimenting with writing a pantoum, a poetic form I had never tried before. I created a similar graphic in order to visualize the repetition pattern and it reminded me of frieze patterns I had seen before. I decided to explore symmetries in poetry by starting with a simple seed that links two lines in different stanzas. Then, I colored each frieze pattern in order to group together lines that were similar.

Two similar lines might:

  • Rhyme with each other, as in a sonnet,
  • Contain the same number of syllables, as in a limerick or ballad,
  • Have the same first word or last word, as in a sestina, or
  • Be the exact same line, as in a pantoum or villanelle!

Below are the poetic forms I found for each symmetry group, along with the name of the pattern the poem is based off of and a color and letter coded similarity scheme for each stanza (two blue a's, for instance, represent lines that are similar to each other in some way). Feel free to write your own poems using these styles, or modify the patterns to make up new styles as you see fit. For instance, I like to preserve the rhyme pattern all the way to the end, and then repeat the rhymes I used at the beginning so the poem is cyclical. I haven't tried most of these forms yet, so I don't know which ones work the best! I would love to hear about any discoveries you make.

Hop: A four-line3 sestina.4


Step:


Sidle:



Spinning Hop: A pantoum arranged into four line stanzas.



Spinning Sidle:



Jump: A terza rima


Spinning Jump:


***

A fun seed to start with is the shape of your footprint. The names of the symmetry groups might give you a hint on how to produce each pattern. 
For more on the mathematics of symmetry and group theory, I recommend this book.
3 Of course, a sestina doesn't necessarily need to have four lines...
There are lots of possibilities for a "hop" type poem, because it contains the least symmetry--it simply contains translational symmetry. 

Wednesday, August 26, 2015

Cross Sectional Astronomy

My research this summer came to an end last week with a seminar I presented at along with many other students in Caltech's Summer Undergraduate Research Program. In addition to presenting my work with Monte Carlo simulations, I also attended talks given by other students doing research in astronomy and physics.

Many of the astronomy projects I learned about focused on creating software for recognizing and analyzing different astronomical phenomena, from variable stars to pulsars and contact binary systems. Many large-scale sky surveys, such as the Palomar Transient Factory and the Sloan Digital Sky Survey, produce a wealth of data on astronomical objects. Computers are often the best way to analyze the abundance of data produced by these surveys in order to identify interesting targets for follow-up study. But why do astronomers need these huge sky surveys and millions of target objects to study?

Analyzing how any population changes over time, whether it is a population of people, stars, or starfish, is a common problem in many areas of science. It can be a tricky problem too, especially when trying to tease out correlation and causation from subtle differences between subgroups of the population. There are two main study methodologies for dealing with this problem: longitudinal studies and cross sectional studies.

Longitudinal studies are the intuitive approach to learning how a population changes over time: just watch as the population (or more realistically, a random sample of the population) evolves naturally. It makes sense, but it's difficult in a lot of situations. For example, longitudinal studies of humans take dedication and decades of research. For phenomena with long lifespans, such as stars, this type of study is simply impossible--the stars vastly outlast human lives and even human civilizations!

Cross sectional studies instead study many individuals in the population at the same time. Each individual represents an individual in a slightly different stage of evolution, with slightly different characteristics; a random sample provided by nature. In humans, an example of a cross sectional study is gathering pictures of many different individuals at different ages in order to examine how appearance changes with age.

Since astronomers only have access to a snapshot of the universe as it appears today, cross sectional studies are what astronomers use to study populations of stars. The most famous example of a cross sectional study is the Hertzsprung-Russel diagram, a plot that correlates star surface temperatures (or colors) with their luminosities. The diagram shows stars in different stages of their evolution, from main sequence stars to red giants and white dwarves, along with stars in transitional states between these major milestones.With the diagram, we can trace the development of different types of stars, and how this development changes with different intrinsic properties of the star (mass turns out to be the most important property in determining the ultimate fate of a star).

There are some problems with the cross sectional approach. For example, age itself may correlate with the evolution of the population in question. In the human example, improving health as time goes on might manifest itself in physical differences, such as an increase in height, between generations that are not caused by the aging process itself. In astronomy, a star that is now nearing the end of its life formed in a quite different universe than a protostar that has just reached the main sequence. We know from theoretical models that the concentration of metals in the universe has increased with time as stars convert hydrogen and helium into heavier elements. Luckily, we can attempt to correct for these effects. Due to the finite speed of light and the vast size of the universe, by looking further and further away, we effectively look back in time. This can help us to determine how conditions were different for older stars when they formed, when compared to stars which are forming today.

Having a large sample size is important in a cross sectional study because it ensures that a representative sample is available and than no important features of the population will be missed. Cross sectional methods and large samples provided by surveys help astronomers to discover how stars age, correlate properties among different populations of stars, and provide experimental confirmation of hypotheses for many types of astronomical objects. There is still much to be learned about a variety of astronomical systems--stars, planets, and more.

Thursday, August 13, 2015

DIY Random Distributions

In this post, I explained how to generate a random, uniform distribution of points on a disk. That problem turns out to be a special case of a more general technique that can be used to generate random numbers with any probability distribution you want. As a bonus, it also explains the seemingly-magical fix (taking the square root of a random distribution to find r) that generates the desired result. But it's not magic--it's a really cool bit of mathematics.

As in my previous post on stochastic geometry, assume that you have a random number generator that outputs numbers randomly selected from 0 to 1. Each possible number has an equal chance of being chosen, so if you plotted how often each number was picked, you would get a uniform distribution.

Let's say, instead, you wanted to generate random numbers between 0 and 1 with a probability distribution function proportional to the polynomial p(x) = -8x4 + 8x3 -2x2 + 2x. Broadly speaking, we'd like to pick numbers around 0.7 the most often, with larger numbers being generated more often than smaller numbers. The distribution looks something like this1:


The first step is to find the cumulative distribution function c(x) of the probability distribution function p(x). If you imagine the chart above as a histogram, the cumulative distribution function would give, for every x, the height of all the bars to the left of that x value. In other words, the cumulative distribution function gives the proportion of the area under the curve that lies to the left of x compared to the total area under the curve. This should sound familiar if you've ever taken a calculus course--to find c(x), we take the integral of p(x) and divide by the integral of p(x) from 0 to 1. If you haven't taken calculus, don't worry. Taking an integral in this context just means finding the area under a curve between two intervals, as described earlier.

Here is what c(x) looks like, plotted alongside p(x). 




The next step is the easiest. Use the random number generator to generate as many random numbers as you need between 0 and 1. I picked five: 0.77375, 0.55492, 0.08021, 0.51151, and 0.184372.

Now, using c(x) as a sort of translator, we can figure out which random numbers in our non-uniform distribution these numbers correspond to. It's important to realize that the random numbers we generated are values of c(x) not values of x. No matter what interval we use, c(x) will always have values from 0 to 1, but we could always use a different probability distribution that had x values from any real number to any other real number. In my research, I use this technique to generate random angles that have values from 0 to π, for instance. So, using these values of c(x), we can interpolate to find the values of x that they correspond to.

Here is the process of interpolation for the numbers I chose. The red points represent the uniformly distributed random values for c(x). The yellow points represent the randomly generated x values that have the same probability distribution as p(x). Very roughly, 0.77375, 0.55492, 0.08021, 0.51151, and 0.18437 correspond to 0.78, 0.65, 0.25, 0.62, and 0.38 respectively, via the green graph of c(x). Although it's hard to tell right now, if I generated enough numbers, we would indeed find we were picking numbers around 0.7 the most often, with more large numbers being generated than small numbers.


In the case of generating random points over a disk, we need to generate random values of r. We are more likely to find points at larger radii than smaller radii simply because a circle with a larger radius has a greater perimeter: perimeter is proportional to radius. Thus, our p(x) is proportional to r, and our c(x) is proportional to r2. This is why we need to take the non-intuitive step of taking the square root when generating uniform, random coordinates for the disk! While to our eyes, the result looks like a uniform covering of the disk, the distribution underneath isn't uniform at all.

This technique is also useful if you want to generate random values according to a Gaussian distribution, also known as a normal distribution or a bell curve. These distributions are ubiquitous in statistics and if you are familiar with image processing, they are the functions behind "Gaussian blur". But of course, they can be used to generate any probability distribution you like, not just these examples.

***

1 I picked this distribution because it's very easy to integrate and visually interesting. It's not actually related to my research at all, and I don't think there's anything especially interesting about it.
2 I really did generate these numbers with my computer--I didn't cherry pick them to look good!