Cassandra Data Model For Correlation Application -


I'm considering Cassandra documentation and videos on YouTube for a few weeks.

I am implementing a log storage and correlation system, and I want to use CASANDRA for this. Although I can not wrap my head around the Cassandra data model for this kind of application. I was not able to find too much in the examples of the depth of the best data model for this type of application.

The logging system revolves around HTTP web traffic. My logs will be expanded in source time, but for now they will include proxy logs, app logs and some other system logs that contain hostname / IP and event data.

My connections revolve around the source and destination of IP addresses and host names, domain names, geographical location, HP method (GET / PUT / Connect), and file type requests for some other There are potential correlations (for example .jar .exe .pdf). Relationships over time will also be important in all these cases.

I have read in many places that data modeling for Cassandra will start by thinking about the problems you are running. That's why I have specified some examples here. There are more examples, but the following will be a good start, and any questions will follow similar correlation patterns.

Example query 1: Show me where IP 10.0.0.1 has been viewed in the last 24 hours or log in the .jar extension in the URL within the last week

Example query 2: Show me all PUT requests to go to the domain xyz.com Last 24 hours

Example query 3: Show all log events for 19 21.18.1.1 from Time 0 through time

Example 4 query: Between me 10.0.0.1 and 1916.1.1.1 Lue The Show All communications yesterday.

Example query 5: Compare all the new events against the current list of domains and IPs and show me new events with these IPs and domains.

If necessary, I can provide more information. Any guidance would be helpful.

Thank you!

For query 1, you have the IP address as a stored key and timestamp in the form of clustering columns and File type is. This way you can query for an IP with file type and timestamp limit.

Query 2, as a composite key for domain and method (keep, receive, ...), timestamp in the form of clustering columns, you may need

query 3, request as clustering ID to add UUID or ID to add UII to the IP, timestamp clustering column, + UUID as the primary key, if required.

Question 4 As a timestamp in the form of IPA and IPB composite primary keys and clustering columns. In this case, if communication is directional, then you have to store IPB and IPA too.

Q5, you have to do this in the client program


Comments