cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

User16826994223
by Honored Contributor III
  • 336 Views
  • 0 replies
  • 0 kudos

Avro fileJune 11, 2021Apache Avro is a data serialization system. Avro provides:Rich data structures.A compact, fast, binary data format.A container f...

Avro fileJune 11, 2021Apache Avro is a data serialization system. Avro provides:Rich data structures.A compact, fast, binary data format.A container file, to store persistent data.Remote procedure call (RPC).Simple integration with dynamic languages....

  • 336 Views
  • 0 replies
  • 0 kudos
User16765131552
by Contributor III
  • 1805 Views
  • 1 replies
  • 0 kudos

Resolved! Connect to Microstrategy

Can Azure Databricks be connected through Microstrategy?

  • 1805 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16765131552
Contributor III
  • 0 kudos

Found this ...Azure Databricks to Microstrategy JDBC/ODBC Setup TipsPurposeThis is a quick reference for common Microstrategy configuration tips, tricks, and common pitfalls when setting up a connection to Databricks:NetworkingFor Azure, we recommend...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1349 Views
  • 1 replies
  • 1 kudos

File path Not recognisable for notebook jobs in DBFS

we are working on IDEs and once code is developed we put the .py file in DBFS and I am uisng that DBFS path to create a job , but I am getting an error dbfs:/artifacts/kg/bootstrap.py. I get the error notebook not found errror.what could be the is...

  • 1349 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

The actual notebooks that you create are not stored in Data plane but it is stored in but in control plane, you can import the notebooks through import in Databricks UI or using API , The notebook placed in DBFS cannot be used to create a job

  • 1 kudos
User16826994223
by Honored Contributor III
  • 919 Views
  • 1 replies
  • 1 kudos

How do i see all the dataframe column if I have more than 1000 column in dataframe

 I tried printSchema() of a Dataframe in Databricks. The Dataframe is having more than 1500 columns and apparently the printscheam function is truncating results and displaying only 1000 items. How to see all columns

  • 919 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

Databricks also shows the schema of the Dataframe when it's created - click on the icon next to the name of variable that holds the dataframeIf you have output of more than limit, then I would imagine outputting the schema into file,

  • 1 kudos
Srikanth_Gupta_
by Valued Contributor
  • 921 Views
  • 1 replies
  • 0 kudos
  • 921 Views
  • 1 replies
  • 0 kudos
Latest Reply
Srikanth_Gupta_
Valued Contributor
  • 0 kudos

Yes we can using below code snippetspark .readStream .format("kafka") .option("kafka.bootstrap.servers", "host1:port1,host2:port2") .option("subscribePattern", "topic.*") .load()

  • 0 kudos
User16826994223
by Honored Contributor III
  • 427 Views
  • 0 replies
  • 0 kudos

VM bootstrap and authentication When a VM boots up, it automatically authenticates with Databricks control plane using Managed Identity (MI), a per-VM...

VM bootstrap and authenticationWhen a VM boots up, it automatically authenticates with Databricks control plane using Managed Identity (MI), a per-VM credential signed by Azure AD. Once authenticated, the VM fetches secrets from the control plane, in...

  • 427 Views
  • 0 replies
  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 940 Views
  • 1 replies
  • 0 kudos

Resolved! Can I give partition filter conditions for the VACUUM command similar to OPTIMIZE

For the optimize command, I can give predicates and it's easy to optimize the partitions where the data is added. Similarly, can I specify the "WHERE" clause on the partition for a VACUUM command

  • 940 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

It's by design, VACUUM command does not support filters on the partition columns. This is because removing the old files partially can leave can impact the time travel feature. 

  • 0 kudos
User16826994223
by Honored Contributor III
  • 616 Views
  • 0 replies
  • 0 kudos

Best practices: Hyperparameter tuning with Hyperopt Bayesian approaches can be much more efficient than grid search and random search. Hence, with the...

Best practices: Hyperparameter tuning with HyperoptBayesian approaches can be much more efficient than grid search and random search. Hence, with the Hyperopt Tree of Parzen Estimators (TPE) algorithm, you can explore more hyperparameters and larger ...

  • 616 Views
  • 0 replies
  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 1549 Views
  • 1 replies
  • 0 kudos

Resolved! How to restart the cluster with new instances?

Whenever I restart a Databricks cluster new instances are not launched. This is because Databricks re-uses the instances. However, sometimes it's needed to launch new instances. Some scenarios are to mitigate a bad VM issue or maybe to get a patch fr...

  • 1549 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

Currently, there is no direct option to restart the cluster with new instances. An easy hack to ensure new instances are launched is to add Cluster tags on your cluster. This will ensure that new instances have to be acquired as it's not possible to ...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 1202 Views
  • 1 replies
  • 0 kudos
  • 1202 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Output operations on DStreams pushes the DStream's data to external systems like a database or a file system. Following are the key operations that can be performed on DStreams.saveAsTextFiles() - Saves the DStream's data as text file.saveAsObjectFil...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 2837 Views
  • 1 replies
  • 0 kudos
  • 2837 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

The off-heap memory is managed outside the executor JVM. Spark has native support to use off-heap memory. The off-heap memory is managed by Spark and not controlled by the executor JVM. Hence GC cycles on the executor do not clean up off-heap. Databr...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 874 Views
  • 1 replies
  • 1 kudos
  • 874 Views
  • 1 replies
  • 1 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 1 kudos

At a high-level VACUUM operation on a Delta table has 2 steps. 1) Identifying the stale files based on the VACUUM command triggered. 2) Deleting the files identified in Step 1The #1 is performed by triggering a Spark job hence utilizes the resource o...

  • 1 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors