摘要: Recent advances in technology allow for the collection and storage of vast amounts data many different areas. Data mining is process discovering new useful information. Many techniques have been developed recent years analysis large datasets, but task assessing significance discovered patterns validity forecast based on these discoveries becoming a major challenge intensive applications. The objective this thesis development rigorous efficient significant three important scenarios. The first scenario frequent itemsets from transactional datasets. For problem we study two primitives: extraction top- closed itemsets, recently proposed alternative to that provides better control output size, which one main challenges traditional problem; use sampling items/itemsets. notion attempt enhance effectiveness framework by relating frequency ranking rather than mere threshold. For both primitives develop algorithms provide experimental evidence their effectiveness. We then address identifying meaningful threshold such are w.r.t. threshold can be flagged as statistically with small (FDR), defined expected ratio false among all discoveries. A crucial feature our approach that, unlike most previous work, it takes into account entire dataset individual discoveries. Experimental results reported show approach. second patterns, called , repeat frequently, possibly some errors, biological sequences. This has attracted wide interest years, since sequence similarity often necessary condition functional correlation. We introduce simple flexible measure bounding number modeled thorugh motif. design algorithm extract maximal dense motifs sequence, returns. Moreover, compare extracted ones found algorithm, showing can identify more according -score, widely employed significance. last consider large-scale gene protein interaction networks, increasing its importance cancer studies. define scale networks. computational first, knowledge, to demonstrate computationally strategy identification mutated subnetworks, efficiently significantly pathways. Moreover test human protein-protein network using mutation studies type of cancers. tests methods correctly identifies pathways implicated cancer.