Stern PHD in finance project for Robert Engle to estimate Volume using UHF data

This project is trying to see if Volume clusters in a similar way to trades. What this means is that Robert Engle in UHF paper showed that if two trades occur in close proximity then the third one is probably going to come quickly too.

However, it would be important to see if in addition to the count of trades, the volume increases also.

One of the main reasons for this is a market microstructure hypothesis that information comes in chunks and that will cause informed traders to trade. If this is true, the market maker should increase bid and ask spreads, because informed traders know more what they are doing, and the probability of trading with an informed trader increases when trades become more frequent (as opposed to trading with a liquitity trader).

Now currently the situation is pretty much like sitting behind a currency exchange shop door and observing the amount of people going in, and then updating your exchange rates accordingly. If lots of people are going in, you could (and should) probably increase the spread because something is going on. However, it would be more useful to know if those people actually trade in large quantities as well.

The clever idea about UHF data models that Engle uses (called ACD) is that you can estimate them by GARCH methodology. He assumes Y=X*e which is just like Y=C+e s2(e)=k+p*s2(-1)e setting C to 0.

I originally tried to do the same thing with volume, however, it is apparent that you need joint estimation of both the time to the next trade, and the volume. So I started thinking about vector GARCH models and how to apply them to ACD.

Eviews is not sufficient for that anymore, so I learned GARCH in the computational finance class. GARCH has toolboxes which show the source code how they actually implemented their garch estimation by simulation. I should be able to modify that code to estimate the joint ACD model.

Final caveat is the distribution function of the volume. The probability of a 1000 share trade occuring is a lot higher than 900. It looks like the distribution function consists of 7 equations - 100,200 lots, all other 100 lots, every lot that is divisible by 500,1000,5000 and 10000 respectively.

To try to find the common equation that would combine them into one equation, I used an OLAP server that allows to slice and dice data. Excel sheets are attached to this file, and hopefully you are able to connect to the live data after reading the tutorial on Portfolix.com demo.

You might find the following useful: