The basic structure of lda-c
corpus
- docs[]
- num_terms :The range or pages of words
- num_docs :The amount of documents ?? value of word or value of length ?? deepth or range?
doc
- words[] :(type:int) An integer representing certain word
- counts[] :(type:int) The frequency of related word
- length :The range of words in certain document
- total :The amount of total words in certain document that is sum of frequency
lda-model
- alpha :unknown
- log_prob_w[NTOPICS][num_terms] log(ss->class_word[k][w]/ss->class_total[k]) prob: distribution of topics ~ words
- num_topics :(NTOPICS) the amount of topics to be trained
- num_terms :The range of words
ss - suffient statistics
- class_word[NTOPICS][num_terms] prob: 1.0/random()
- class_total[NTOPICS] :The sum of frequency of related class_word
- alpha_suffstats
- num_docs
var_gamma[docs][NTOPICS]
doc ~ topics
phi[max-corpus_length][NTOPICS]
word ~ topics