cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ramravi
by Contributor II
  • 26693 Views
  • 2 replies
  • 6 kudos
  • 26693 Views
  • 2 replies
  • 6 kudos
Latest Reply
Rishabh-Pandey
Databricks MVP
  • 6 kudos

hey @Ravi Teja​ there is two methods by which we can limit our datafame , by using take and limit .refer this concept myDataFrame.take(10)-> results in an Array of Rows. This is an action and performs collecting the data (like collect does).myDataFra...

  • 6 kudos
1 More Replies
Prototype998
by New Contributor III
  • 9265 Views
  • 2 replies
  • 1 kudos

Resolved! Connecting Databricks with FTP server

hey i want to know how to connect Databricks with the FTP server ??? any help would be really appreciated

  • 9265 Views
  • 2 replies
  • 1 kudos
Latest Reply
Rishabh-Pandey
Databricks MVP
  • 1 kudos

hey @Punit Chauhan​ , refer this code to connect with FTP server Host=""   Login=""   Passwd=""   ftp_dir=""   ftp = ftplib.FTP(Host) ftp.login(Login,Passwd) ftp.cwd(ftp_dir) files=ftp.nlst(ftp_dir) print(files)

  • 1 kudos
1 More Replies
Ajay-Pandey
by Databricks MVP
  • 9605 Views
  • 3 replies
  • 13 kudos

Resolved! Fetching data in excel through delta sharing

Hi all,Is anyway that we can access or push data in delta sharing by using Microsoft excel?

  • 9605 Views
  • 3 replies
  • 13 kudos
Latest Reply
Rishabh-Pandey
Databricks MVP
  • 13 kudos

hey @Ajay Pandey​ yes recently the new excel feature also comes in the market that we can enable the delta sharing from excel also so whatever the changes you will made to delta , it will automaticaly get saved in the excel file also ,refer this lin...

  • 13 kudos
2 More Replies
Prototype998
by New Contributor III
  • 6456 Views
  • 5 replies
  • 2 kudos

Resolved! reading multiple csv files using pathos.multiprocessing

I'm using PySpark and Pathos to read numerous CSV files and create many DF, but I keep getting this problem.code for the same:-from pathos.multiprocessing import ProcessingPooldef readCsv(path):  return spark.read.csv(path,header=True)csv_file_list =...

dbx_error
  • 6456 Views
  • 5 replies
  • 2 kudos
Latest Reply
Prototype998
New Contributor III
  • 2 kudos

@Ajay Pandey​ @Rishabh Pandey​ 

  • 2 kudos
4 More Replies
ratnakarsinha
by New Contributor II
  • 25208 Views
  • 3 replies
  • 0 kudos

How to get full result using DataFrame.Display method

Hi, Dataframe.Display method in Databricks notebook fetches only 1000 rows by default. Is there a way to change this default to display and download full result (more than 1000 rows) in python? Thanks, Ratnakar.

  • 25208 Views
  • 3 replies
  • 0 kudos
Latest Reply
ramravi
Contributor II
  • 0 kudos

display method doesn't have the option to choose the number of rows. Use the show method. It is not neat and you can't do visualizations and downloads.

  • 0 kudos
2 More Replies
Trodenn
by New Contributor III
  • 10535 Views
  • 4 replies
  • 1 kudos

How to merge two separate DELTA LIVE TABLE?

So I have two delta live tables. One that is the master table that contains all the prior data, and another table that contains all the new data for that specific day. I want to be able to merge those two table so that the master table contains would...

  • 10535 Views
  • 4 replies
  • 1 kudos
Latest Reply
Ajay-Pandey
Databricks MVP
  • 1 kudos

@Rishabh Pandey​ 

  • 1 kudos
3 More Replies
Mahesh_789
by Databricks Partner
  • 1270 Views
  • 0 replies
  • 1 kudos

While accessing the data on recipient side using delta_sharing.load_table_changes_as_spark(), it shows data of all versions.

When I tried to access specific version data and set the arguments value to the specific number, I get all version data.data1 = delta_sharing.load_table_changes_as_spark(table_url, starting_version=1, ending_version=1)data2 = delta_sharing.load_table...

  • 1270 Views
  • 0 replies
  • 1 kudos
kmckee
by New Contributor II
  • 1595 Views
  • 0 replies
  • 1 kudos

Trouble Displaying Full Size Images from Spark Dataframe

Hi, I have followed this guide (https://learn.microsoft.com/en-us/azure/databricks/_static/notebooks/image-data-source.html) to successfully load some image data into a spark df and display it as a thumbnail. I would like to display a single image fr...

  • 1595 Views
  • 0 replies
  • 1 kudos
weldermartins
by Honored Contributor
  • 5335 Views
  • 3 replies
  • 6 kudos

Resolved! Function When + Dictionary.

Hey everyone, I'm avoiding repeating the When Function for 12x, so I thought of the dictionary. I don't know if it's a limitation of the Spark function or a Logic error. Does the function allow this concatenation?

image
  • 5335 Views
  • 3 replies
  • 6 kudos
Latest Reply
weldermartins
Honored Contributor
  • 6 kudos

Hello everyone, I found this alternative to reduce repeated code.custoDF = (custoDF.withColumn('month', col('Nummes').cast('string')) .replace(months, subset=['month']))

  • 6 kudos
2 More Replies
sfalquier
by New Contributor II
  • 3674 Views
  • 3 replies
  • 0 kudos

HTTP 403 on git-credentials API

Hi,I am trying to set git credentials for my service principal. I follow the process described here but I get a 403 error when making the POST request to ${DATABRICKS_HOST}/api/2.0/git-credentials with service principal token.By the way, I also canno...

  • 3674 Views
  • 3 replies
  • 0 kudos
Latest Reply
Vivian_Wilfred
Databricks Employee
  • 0 kudos

Hi @Sébastien FALQUIER​ it works for me, there are no restrictions. Maybe the PAT token you generated for the service principle got expired. Can you generate a new token and try to run GET/git-credentials API?How are you creating PAT for service prin...

  • 0 kudos
2 More Replies
martcerv
by New Contributor II
  • 4629 Views
  • 4 replies
  • 2 kudos

Cloud provider launch failure

When I want to create a cluster a get this error message:DetailsAWS API error code: InvalidGroup.NotFoundAWS error message: The security group 'sg-0ded75eefd66bf421' does not exist in VPC 'vpc-0ec7da3d5977f6ec9'And when I inspect the security groups ...

  • 4629 Views
  • 4 replies
  • 2 kudos
Latest Reply
AminChad_22427
New Contributor II
  • 2 kudos

Hi, I am running into a similar issue. but in my case, the security has been deleted by mistake.Is there a way to make Databricks recreate the missing group ?@Kaniz Fatma​ , where can the CreateSecurityGroup command be ran ? Does it change the securi...

  • 2 kudos
3 More Replies
sudhanshu1
by New Contributor III
  • 1107 Views
  • 0 replies
  • 0 kudos

Structured Streaming

I need some solution for below problem.We have set of json files which are keep coming to aws s3, these files contains details for a property . please note 1 property can have 10-12 rows in this json file. Attached is sample json file.We need to read...

  • 1107 Views
  • 0 replies
  • 0 kudos
KVNARK
by Honored Contributor II
  • 5361 Views
  • 4 replies
  • 13 kudos

Resolved! To practice Databricks SQL

Is there any sand box kind of thing where we can do some hands-on on Databricks SQL/run the Note books attaching to the Clusters apart from the free trial provided by Databricks.

  • 5361 Views
  • 4 replies
  • 13 kudos
Latest Reply
Harun
Honored Contributor
  • 13 kudos

Databricks SQL workspace will be available only for Databricks Premium service. If you have Azure Pass subscription, then you can able to get it for practicing it.

  • 13 kudos
3 More Replies
Labels