Archive for the ‘python’ Category

Turkish Deasciifier - Google App Engine version

Turkish Deasciifier - Google App Engine version

The Google App Engine version of Turkish Deasciifier is ready. You can try it at http://turkceyap.appspot.com/

This is version 0.1 and is the actual Python implementation. I have created this so that people who doesn’t want to install a Firefox add-on or Google Chrome extension can give it a try, too.

For related news and updates you can check http://ileriseviye.org/blog/?tag=turkish-deasciifier

24
Jul

turkish-deasciifier: Added to Softpedia

   Posted by: Emre Sevinc Tags:

I’ve just received an e-mail from Softpedia Editorial Team about my Python implementation of Deniz Yüret’s Turkish deasciifier:

Congratulations,

Turkish Deasciifier, one of your products, has been added to Softpedia’s database of software programs for Linux. It is featured with a description text, screenshots, download links and technical details on this page:
http://linux.softpedia.com/get/Text-Editing-Processing/Others/Turkish-Deasciifier-58739.shtml

The description text was created by our editors, using sources such as text from your product’s homepage, information from its help system, the PAD file (if available) and the editor’s own opinions on the program itself.

For related posts please visit http://ileriseviye.org/blog/?tag=turkish-deasciifier

I’ve recently added my Python implementation of Turkish deasciifier to the Python Package Index (PyPI). You can see the details at: http://pypi.python.org/pypi/Turkish Deasciifier/

Now I’m in the process of creating a Firefox plug-in using Jetpack SDK. This will make it much easier for end users.

For turkish-deasciifier related posts please visit http://ileriseviye.org/blog/?tag=turkish-deasciifier

PS: One of the first people who provided feedback about my PyPI package was a French programmer who was also studying Turkish. I’m glad to see that my efforts help people who are not native Turkish speakers, too.

I have recently finished converting Deniz Yüret’s Turkish deasciifier, turkish-mode (that was implemented in Emacs Lisp) into Python. The source code is available at http://github.com/emres/turkish-deasciifier.

For those who are a little bit puzzled at the term ‘deasciification’: It is the process of converting a Turkish text that is written using only ASCII letters into a Turkish text with correct Turkish letters. For example if your ASCII-only Turkish text is:
Read the rest of this entry »

23
May

Comparing top 100 Dutch words to Zipf’s law

   Posted by: Emre Sevinc

Recently I was involved with a project that was related to the website of the Universiteit Antwerpen (UA). As a result of my task I developed a system in Python to count the frequencies of all the words that occur throughout the UA website. After spending a few days playing with the data my system produced, I remembered an interesting mathematical law that is supposed to apply to natural languages, namely the Zipf’s law:

“Zipf’s law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, which occurs twice as often as the fourth most frequent word, etc.

Zipf’s law is most easily observed by plotting the data on a log-log graph, with the axes being log(rank order) and log(frequency).”

So I decided to give my data set a try and see how well it conforms to Zipf’s law. I plotted two graphs, the linear graph being the ideal case of Zipf’s law, and the other (blue one) being the actual data:

Zipfs law and top 100 Dutch words

Zipf's law and top 100 Dutch words


Read the rest of this entry »