cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

enichante
by New Contributor
  • 2830 Views
  • 4 replies
  • 5 kudos

Resolved! Databricks: Report on SQL queries that are being executed

We have a SQL workspace with a cluster running that services a number of self service reports against a range of datasets. We want to be able to analyse and report on the queries our self service users are executing so we can get better visibility of...

  • 2830 Views
  • 4 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Looks like the people have spoken: API is your best option! (thanks @Werner Stinckens​  @Chris Grabiel​  and @Bilal Aslam​ !) @eni chante​ Let us know if you have questions about the API! If not, please mark one of the replies above as the "best answ...

  • 5 kudos
3 More Replies
cristianc
by Contributor
  • 4181 Views
  • 2 replies
  • 2 kudos

Resolved! Is VACUUM operation recorded in the history of the delta table?

Greetings,I have tried using Spark with DBR 9.1 LTS to run VACUUM on my delta table then DESCRIBE HISTORY to see the operation, but apparently the VACUUM operation was not in the history despite the things stated in the documentation from: https://do...

  • 4181 Views
  • 2 replies
  • 2 kudos
Latest Reply
cristianc
Contributor
  • 2 kudos

That makes sense, thanks for the reply!

  • 2 kudos
1 More Replies
adnanzak
by New Contributor II
  • 2850 Views
  • 3 replies
  • 0 kudos

Resolved! Deploy Databricks Machine Learing Models On Power BI

Hi Guys. I've implemented a Machine Learning model on Databricks and have registered it with a Model URL. I wanted to enquire if I could use this model on Power BI. Basically the model predicts industries based on client demographics. Ideally I would...

  • 2850 Views
  • 3 replies
  • 0 kudos
Latest Reply
adnanzak
New Contributor II
  • 0 kudos

Thank you @Werner Stinckens​  and @Joseph Kambourakis​  for your replies.

  • 0 kudos
2 More Replies
DarshilDesai
by New Contributor II
  • 13080 Views
  • 1 replies
  • 3 kudos

Resolved! How to Efficiently Read Nested JSON in PySpark?

I am having trouble efficiently reading & parsing in a large number of stream files in Pyspark! Context Here is the schema of the stream file that I am reading in JSON. Blank spaces are edits for confidentiality purposes. root |-- location_info: ar...

  • 13080 Views
  • 1 replies
  • 3 kudos
Latest Reply
Chris_Shehu
Valued Contributor III
  • 3 kudos

I'm interested in seeing what others have come up with. Currently I'm using Json. normalize() then taking any additional nested statements and using a loop to pull them out -> re-combine them.

  • 3 kudos
umair
by New Contributor
  • 2217 Views
  • 1 replies
  • 1 kudos

Resolved! Cannot Reproduce Result scikit-learn random forest

I'm running some machine learning experiments in databricks. For random forest algorithm when i restart the cluster, each time the training output is changes even though random state is set. Anyone has any clue about this issue?Note : I tried the sam...

  • 2217 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

RF is non-deterministic by it´s nature.However as you mentioned you can control this by using random_state.This will guarantee a deterministic result ON A CERTAIN SYSTEM, but not necessarily over systems.SO has a topic about this, check it out, very ...

  • 1 kudos
Anonymous
by Not applicable
  • 2212 Views
  • 1 replies
  • 2 kudos

Issue in creating workspace - Custom AWS Configuration

We have tried to create new workspace using "Custom AWS Configuration" and we have given our own VPC (Customer managed VPC) and tried but workspace failed to launch. We are getting below error which couldn't understand where the issue is in.Workspace...

  • 2212 Views
  • 1 replies
  • 2 kudos
Latest Reply
Mitesh_Patel
New Contributor III
  • 2 kudos

I'm also getting the same issue. I'm trying to create a E2 workspace using Terraform with Customer-managed VPC in us-east-1 (using private subnets for 1a and 1b). We have 1 network rule attached to our subnets that looks like this:  Similar question ...

  • 2 kudos
BasavarajAngadi
by Contributor
  • 3139 Views
  • 7 replies
  • 9 kudos

Resolved! Hi Experts , I am new to databricks. I want to know how to copy pyspark data into databricks SQL analytics ?

If we use two different clusters one for pyspark code for transformation and one for SQL analytics . how to make permenant tables derived from pyspark code make available for running queries in databricks SQL analytics

  • 3139 Views
  • 7 replies
  • 9 kudos
Latest Reply
BasavarajAngadi
Contributor
  • 9 kudos

@Aman Sehgal​  Can we write data from data engineering workspace to SQL end point in databricks?

  • 9 kudos
6 More Replies
Users-all
by New Contributor
  • 2474 Views
  • 0 replies
  • 0 kudos

xml module not found error

ModuleNotFoundError: No module named 'com.databricks.spark.xml'I'm using Azure databricks, and I've added what I think is the correct library, Status InstalledCoordinatecom.databricks:spark-xml_2.12:0.13.0

  • 2474 Views
  • 0 replies
  • 0 kudos
alejandrofm
by Valued Contributor
  • 2335 Views
  • 3 replies
  • 1 kudos

Resolved! Recommendations to execute OPTIMIZE on tables

Hi, have Databricks running on AWS, I'm looking for a way to know when is a good time to run optimize on partitioned tables. Taking into account that it's an expensive process, especially on big tables, how could I know if it's a good time to run it ...

  • 2335 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Alejandro Martinez​ - If Jose's answer resolved your question, would you be happy to mark his answer as best? That helps other members find the answer more quickly.

  • 1 kudos
2 More Replies
BenzDriver
by New Contributor II
  • 2179 Views
  • 2 replies
  • 1 kudos

Resolved! SQL command FSCK is not found

Hello there,I currently have the problem of deleted files still being in the transaction log when trying to call a delta table. What I found was this statement:%sql FSCK REPAIR TABLE table_name [DRY RUN]But using it returned following error:Error in ...

  • 2179 Views
  • 2 replies
  • 1 kudos
Latest Reply
RKNutalapati
Valued Contributor
  • 1 kudos

Remove square brackets and try executing the command%sqlFSCK REPAIR TABLE table_name DRY RUN

  • 1 kudos
1 More Replies
qyu
by New Contributor II
  • 10002 Views
  • 3 replies
  • 3 kudos

Resolved! Need help with this python import error.

I am using databricks runtime 9.1 LTS ML and I got this error when I tried to import Scikit Learn package. I got the following error message:TypeError Traceback (most recent call last) <command-181041> in <module> ...

  • 10002 Views
  • 3 replies
  • 3 kudos
Latest Reply
qyu
New Contributor II
  • 3 kudos

@Atanu Sarkar​ I am using databricks runtime 9.1ML LTS and python version is 3.8.10I am only just running import statementfrom sklearn.metrics import * from sklearn.preprocessing import LabelEncoder

  • 3 kudos
2 More Replies
danielveraec
by New Contributor III
  • 8910 Views
  • 3 replies
  • 1 kudos

Resolved! Error writing a partitioned Delta Table from a multitasking job in azure databricks

I have a notebook that writes a delta table with a statement similar to the following:match = "current.country = updates.country and current.process_date = updates.process_date" deltaTable = DeltaTable.forPath(spark, silver_path) deltaTable.alias("cu...

eb3tr
  • 8910 Views
  • 3 replies
  • 1 kudos
Latest Reply
danielveraec
New Contributor III
  • 1 kudos

Initially, the affected table only had a date field as partition. So I partitioned it with country and date fields. This new partition created the country and date directories however the old directories of the date partition remained and were not de...

  • 1 kudos
2 More Replies
sudhanshu1
by New Contributor III
  • 6362 Views
  • 1 replies
  • 0 kudos

Query to know all tables and columns name in delta lake

Hi all,​Does anyone know how to write simple SQL query to get all tables and columns name. In oracle we do ,select * from all tab columns. Similarly in SQL server we do select * from information schema . columns.​Do we have something like this in dat...

  • 6362 Views
  • 1 replies
  • 0 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 0 kudos

To view columns in a table, use SHOW COLUMNS.%sql show columns in <schema_name>.<table_name>To show all the tables in a column, use following PySpark code:%python   schema_name = "default" tbl_columns = {}   # Get all tables in a schema tables = spar...

  • 0 kudos
Jreco
by Contributor
  • 4853 Views
  • 6 replies
  • 4 kudos

Resolved! messages from event hub does not flow after a time

Hi Team,I'm trying to build a Real-time solution using Databricks and Event hubs.Something weird happens after a time that the process start.At the begining the messages flow through the process as expected with this rate: please, note that the last ...

image image image
  • 4853 Views
  • 6 replies
  • 4 kudos
Latest Reply
Jreco
Contributor
  • 4 kudos

Thanks for your answer @Hubert Dudek​ , Is already specifiedWhat do youn mean with this? This is the weird part of this, bucause the data is flowing good, but at any time is like the Job stop the reading or somethign like that and if I restart the ...

  • 4 kudos
5 More Replies
wpenfold
by New Contributor II
  • 29745 Views
  • 5 replies
  • 2 kudos
  • 29745 Views
  • 5 replies
  • 2 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 2 kudos

Using workspace API you can list out all the notebooks for a given user.The API response will tell you if the objects under the path is a folder or a notebook. If it's a folder then you can add it to the path and get notebooks within the folder.Put a...

  • 2 kudos
4 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels