cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

User16789201666
by Contributor II
  • 2693 Views
  • 1 replies
  • 0 kudos

what does this error in hyperopt mean, One error that users commonly encounter with Hyperopt is: There are no evaluation tasks, cannot return argmin of task losses.?

This means that no trial completed successfully. This almost always means that there is a bug in the objective function, and every invocation is resulting in an error. See the error output in the logs for details. In Databricks, the underlying error ...

  • 2693 Views
  • 1 replies
  • 0 kudos
Latest Reply
tj-cycyota
New Contributor III
  • 0 kudos

The fmin function should be of the form:def evaluate_hyperparams(params): """ This method will be passed to `hyperopt.fmin()`. It fits and evaluates the model using the given hyperparameters to get the validation loss. :param params: This d...

  • 0 kudos
User16789201666
by Contributor II
  • 1113 Views
  • 0 replies
  • 0 kudos

Hyperopt, how to setup hyper-parameter for categorical vs numerical hyperparameter?

 hp.quniform (“quantized uniform”) or hp.qloguniform to generate integers. hp.choice is the right choice when, for example, choosing among categorical choices (which might in some situations even be integers, but not usually).https://databricks.com/b...

  • 1113 Views
  • 0 replies
  • 0 kudos
User16765131552
by Contributor III
  • 3083 Views
  • 3 replies
  • 0 kudos

COPY INTO: How to add a partitioning?

The command COPY INTO from Databricks provides an idempotent file ingestion into a delta table, see here. From the docs, an example command looks like this:COPY INTO delta.`target_path` FROM (SELECT key, index, textData, 'constant_value' FROM 'sour...

  • 3083 Views
  • 3 replies
  • 0 kudos
Latest Reply
Mooune_DBU
Valued Contributor
  • 0 kudos

If you're looking to partition your `target_path` table, then it's recommended to define the partition keys prior to the COPY INTO command (at the DDL level)E.g.// Drop table if it already exists without the partition key defined (OPTIONNAL) DROP TAB...

  • 0 kudos
2 More Replies
brickster_2018
by Esteemed Contributor
  • 5304 Views
  • 1 replies
  • 0 kudos

Resolved! Why do I always see "Executor heartbeat timed out" messages in the Spark Driver logs

Often, I see "Executor heartbeat timed out" messages in the Spark driver logs. Sometimes job fails with this error. Will increasing "spark.executor.heartbeatInterval" help to mitigate the issue ?

  • 5304 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

This is a common misconception that increasing "spark.executor.heartbeatInterval" will help to mitigate or resolve the heartbeat issues. In fact, increasing the spark.executor.heartbeatInterval will increase the chance of the error and worse the situ...

  • 0 kudos
jose_gonzalez
by Moderator
  • 1975 Views
  • 3 replies
  • 0 kudos

How to check my streaming job's metrics?

I would like to know if there is a way to keep track of my running streaming job.

  • 1975 Views
  • 3 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

Streaming metrics are available/exposed mainly through 3 ways:Streaming UI, which is available from Spark 3/DBR 7Streaming listener/Observable metrics APISpark driver logs. Search for the string "Streaming query made progress". The metrics are logged...

  • 0 kudos
2 More Replies
User16752244127
by Contributor
  • 1311 Views
  • 1 replies
  • 0 kudos
  • 1311 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16752244127
Contributor
  • 0 kudos

yes. it is actually quite fun to build Looker dashboards on top of delta lake. here is a Databricks on Looker tutorial that I created when Databricks on GCP was released.

  • 0 kudos
User16826994223
by Honored Contributor III
  • 640 Views
  • 1 replies
  • 0 kudos

Koalas or Pyspark

Should I use PySpark’s DataFrame API or Koalas, Which one is recommended , is there any performance impact if i use koalas or little slower than pyspark API

  • 640 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

If you are already familiar with pandas and want to leverage Spark for big data, we recommend using Koalas. If you are learning Spark from ground up, we recommend you start with PySpark’s API.

  • 0 kudos
Anonymous
by Not applicable
  • 738 Views
  • 1 replies
  • 0 kudos

Photon usage

How do I know how much of a query/job used Photon?

  • 738 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

If you are using Photon on Databricks SQLClick the Query History icon on the sidebar.Click the line containing the query you’d like to analyze.On the Query Details pop-up, click Execution Details.Look at the Task Time in Photon metric at the bottom.

  • 0 kudos
Anonymous
by Not applicable
  • 2418 Views
  • 1 replies
  • 0 kudos

Malformed Request Error Message

I received the following error when launching a workspace: MALFORMED_REQUEST: Failed storage configuration validation checks: PUT, LIST, DELETE. How do I fix this?

  • 2418 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Check the S3 bucket policy and region. It looks like storage config validation is failing

  • 0 kudos
Anonymous
by Not applicable
  • 1093 Views
  • 1 replies
  • 0 kudos

Failed E2 Workspace Error Message

My workspace ended up in a FAILED workspace state with one of the following messages:INVALID_STATE: The maximum number of VPCs has been reached.INVALID_STATE: The maximum number of VPC endpoints has been reached.INVALID_STATE: The maximum number of a...

  • 1093 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

The maximum number of xxx has been reached indicates that you have hit the soft limits for some of the AWS resources in that region. These are mostly soft limits and you could file a request to AWS support team to increase this

  • 0 kudos
Anonymous
by Not applicable
  • 857 Views
  • 1 replies
  • 0 kudos

E2 Workspace DNS Unreachable

My E2 workspace is in a RUNNING state, but the DNS is unreachable.

  • 857 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Try deleting the RUNNING workspace, wait for 5-10 minutes, and recreate the same workspace. If that doesn't solve the problem, file a support ticket

  • 0 kudos
Anonymous
by Not applicable
  • 881 Views
  • 1 replies
  • 0 kudos

E2 workspace - Error Message Malformed Request : Invalid xxx in the HTTP request body

I received one of the following errors: MALFORMED_REQUEST: Invalid xxx in the HTTP request body or MALFORMED_REQUEST: Invalid xxx in body, where xxx is credentials, storage configurations, networks, etc.

  • 881 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

It denotes that the input payload is not what is expected the api-endpoint for the e2 accounts api. Possible causes include typo in variable values or json formatting issues ( not providing quotes etc )

  • 0 kudos
Anonymous
by Not applicable
  • 4486 Views
  • 1 replies
  • 1 kudos
  • 4486 Views
  • 1 replies
  • 1 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 1 kudos

To access these driver log files from the UI, you could go to the Driver Logs tab on the cluster details page. You can also configure a log delivery location for the cluster. Both worker and cluster logs are delivered to the location you specify.

  • 1 kudos
Anonymous
by Not applicable
  • 1074 Views
  • 1 replies
  • 0 kudos
  • 1074 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

When you remove a user from Databricks, a special backup folder is created in the workspace. More details at https://kb.databricks.com/notebooks/get-notebooks-deleted-user.html

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels