cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16765131552
by Contributor III
  • 1362 Views
  • 1 replies
  • 0 kudos

Resolved! Connect to Microstrategy

Can Azure Databricks be connected through Microstrategy?

  • 1362 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16765131552
Contributor III
  • 0 kudos

Found this ...Azure Databricks to Microstrategy JDBC/ODBC Setup TipsPurposeThis is a quick reference for common Microstrategy configuration tips, tricks, and common pitfalls when setting up a connection to Databricks:NetworkingFor Azure, we recommend...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1134 Views
  • 1 replies
  • 1 kudos

File path Not recognisable for notebook jobs in DBFS

we are working on IDEs and once code is developed we put the .py file in DBFS and I am uisng that DBFS path to create a job , but I am getting an error dbfs:/artifacts/kg/bootstrap.py. I get the error notebook not found errror.what could be the is...

  • 1134 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

The actual notebooks that you create are not stored in Data plane but it is stored in but in control plane, you can import the notebooks through import in Databricks UI or using API , The notebook placed in DBFS cannot be used to create a job

  • 1 kudos
User16826994223
by Honored Contributor III
  • 707 Views
  • 1 replies
  • 1 kudos

How do i see all the dataframe column if I have more than 1000 column in dataframe

 I tried printSchema() of a Dataframe in Databricks. The Dataframe is having more than 1500 columns and apparently the printscheam function is truncating results and displaying only 1000 items. How to see all columns

  • 707 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

Databricks also shows the schema of the Dataframe when it's created - click on the icon next to the name of variable that holds the dataframeIf you have output of more than limit, then I would imagine outputting the schema into file,

  • 1 kudos
Srikanth_Gupta_
by Valued Contributor
  • 687 Views
  • 1 replies
  • 0 kudos
  • 687 Views
  • 1 replies
  • 0 kudos
Latest Reply
Srikanth_Gupta_
Valued Contributor
  • 0 kudos

Yes we can using below code snippetspark .readStream .format("kafka") .option("kafka.bootstrap.servers", "host1:port1,host2:port2") .option("subscribePattern", "topic.*") .load()

  • 0 kudos
User16826994223
by Honored Contributor III
  • 294 Views
  • 0 replies
  • 0 kudos

VM bootstrap and authentication When a VM boots up, it automatically authenticates with Databricks control plane using Managed Identity (MI), a per-VM...

VM bootstrap and authenticationWhen a VM boots up, it automatically authenticates with Databricks control plane using Managed Identity (MI), a per-VM credential signed by Azure AD. Once authenticated, the VM fetches secrets from the control plane, in...

  • 294 Views
  • 0 replies
  • 0 kudos
User16869510359
by Esteemed Contributor
  • 647 Views
  • 1 replies
  • 0 kudos

Resolved! Can I give partition filter conditions for the VACUUM command similar to OPTIMIZE

For the optimize command, I can give predicates and it's easy to optimize the partitions where the data is added. Similarly, can I specify the "WHERE" clause on the partition for a VACUUM command

  • 647 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 0 kudos

It's by design, VACUUM command does not support filters on the partition columns. This is because removing the old files partially can leave can impact the time travel feature. 

  • 0 kudos
User16826994223
by Honored Contributor III
  • 413 Views
  • 0 replies
  • 0 kudos

Best practices: Hyperparameter tuning with Hyperopt Bayesian approaches can be much more efficient than grid search and random search. Hence, with the...

Best practices: Hyperparameter tuning with HyperoptBayesian approaches can be much more efficient than grid search and random search. Hence, with the Hyperopt Tree of Parzen Estimators (TPE) algorithm, you can explore more hyperparameters and larger ...

  • 413 Views
  • 0 replies
  • 0 kudos
User16869510359
by Esteemed Contributor
  • 1233 Views
  • 1 replies
  • 0 kudos

Resolved! How to restart the cluster with new instances?

Whenever I restart a Databricks cluster new instances are not launched. This is because Databricks re-uses the instances. However, sometimes it's needed to launch new instances. Some scenarios are to mitigate a bad VM issue or maybe to get a patch fr...

  • 1233 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 0 kudos

Currently, there is no direct option to restart the cluster with new instances. An easy hack to ensure new instances are launched is to add Cluster tags on your cluster. This will ensure that new instances have to be acquired as it's not possible to ...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 939 Views
  • 1 replies
  • 0 kudos
  • 939 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Output operations on DStreams pushes the DStream's data to external systems like a database or a file system. Following are the key operations that can be performed on DStreams.saveAsTextFiles() - Saves the DStream's data as text file.saveAsObjectFil...

  • 0 kudos
User16869510359
by Esteemed Contributor
  • 2389 Views
  • 1 replies
  • 0 kudos
  • 2389 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 0 kudos

The off-heap memory is managed outside the executor JVM. Spark has native support to use off-heap memory. The off-heap memory is managed by Spark and not controlled by the executor JVM. Hence GC cycles on the executor do not clean up off-heap. Databr...

  • 0 kudos
User16869510359
by Esteemed Contributor
  • 622 Views
  • 1 replies
  • 1 kudos
  • 622 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 1 kudos

At a high-level VACUUM operation on a Delta table has 2 steps. 1) Identifying the stale files based on the VACUUM command triggered. 2) Deleting the files identified in Step 1The #1 is performed by triggering a Spark job hence utilizes the resource o...

  • 1 kudos
User16826994223
by Honored Contributor III
  • 714 Views
  • 1 replies
  • 0 kudos

Even the Unfinished Experiment in Mlflow is getting saved as finished

when I start the experiment with mlflow.start_run(),even if my script is interrupted or failed before executing mlflow.end_run() ,the run gets tagged as finished instead of unfinished , Can any one help why it is happening here

  • 714 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

In note book the mlflow tagas ias the command travels and once failed or exit there itself it logs and finishes the experiment even if the noteboolsfails. However, if you want to continue logging metrics or artifacts to that run, you just need to use...

  • 0 kudos
User16869510359
by Esteemed Contributor
  • 580 Views
  • 1 replies
  • 0 kudos

Resolved! Why is my streaming job not resuming even though I specified checkpoint directory

I have provided the checkpointLocation as below, however I see the config is ignored for my streaming queryoption("checkpointLocation", "path/to/checkpoint/dir")

  • 580 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 0 kudos

This is a common question from many users. If the streaming checkpoint directory is specified correctly then this behavior is expected. Below is an example of specifying the checkpoint correctlydf.writeStream   .format("parquet")   .option("checkpo...

  • 0 kudos
Labels
Top Kudoed Authors